Skip to main content

Subqueries and CTEs

Updated Aug 28, 2019 ·

Sample Tables

Here is the schema for the sample World table:

To download the actual files, you can get them from my Github repository.

Additive Joins

In SQL, joins are typically used to combine data from two or more tables. The joins discussed in the previous pages are "additive," meaning they add columns to the original left table.

As a recap:

INNER JOIN adds columns to the original left table based on matching rows in both tables. For example, an INNER JOIN on the id field adds additional columns from the right table to the left table.

SELECT *
FROM left_table
INNER JOIN right_table
ON left_table.id = right_table.id;

Explanations:

  • Fields with different names are added with their original names.
  • Fields with the same name can result in duplicate columns, which can be renamed using aliasing.

To make it clearer, see the diagram below.

Subqueries

Subqueries are nested queries inside another query. They can simplify results and improve readability.

  • Can be used in SELECT, FROM, and WHERE clauses.
  • Often return single results instead of multiple rows.
  • Provide a more readable alternative to complex joins.
info

Performance note: Subqueries in FROM clauses limit optimization and may reduce readability.

Semi Join

Semi joins differ from traditional joins in that they don't add columns from the right table. Instead, they filter the left table based on a condition in the right table. They return all rows from the left table where a specified condition is met in the right table.

Example on Semi Join

Consider the presidents and states table. Identify which countries gained independence before 1800, and then identify the presidents of those countries.

We can start with finding the countries first:

SELECT country
FROM states
WHERE indep_year < 1800;

We can then further filter the output by using the pevious SQL statement as a subquery within another SQL statement.

SELECT country, continent, president
FROM presidents
WHERE country IN
(
SELECT country
FROM states
WHERE indep_year < 1800
);

Anti Join

An anti join filters the left table by excluding rows that match a condition in the right table. It returns rows from the left table where a specified condition is NOT met in the right table.

Example of Anti Join

Consider the presidents and states table. Identify countries in the Americas founded after 1800.

SELECT country, president
FROM presidents
WHERE continent LIKE '%America'
AND country NOT IN
(
SELECT country
FROM states
WHERE indep_year < 1800
);

Types of Subqueries

  1. SELECT Subqueries

    • Used in the SELECT clause to calculate values.
    • Avoids generating large result sets from full outer joins.
    • Example: Compare average word length in a movie script and the English language.
  2. WHERE Subqueries

    • Applied in the WHERE clause to act as dynamic filters.
    • Example: Filter English words found in a movie script dynamically.
  3. FROM Subqueries

    • Creates temporary result sets for the main query.
    • Better rewritten as joins for better readability and performance.

Subqueries in WHERE Clause

Subqueries are frequently used inside the WHERE clause, which allows for filtering data based on complex conditions. This is particularly useful in semi joins and anti joins.

The WHERE clause is often the go-to place for subqueries because filtering is a fundamental task in data manipulation.

Syntax for subqueries:

SELECT * 
FROM table_1
WHERE field_1 IN
(include subquery here)

We can then add the subquery. Note that column_name and another_column_name needs to be of the same data type for this to work. Subqueries can reference the same table or a different table, allowing for diverse query structures.

SELECT *
FROM table_name
WHERE column_name IN (
SELECT another_column_name
FROM another_table
WHERE field_2 = condition
);

Subqueries in SELECT Clause

Subqueries can also be placed inside the SELECT clause to calculate or retrieve additional data that complements each row in the main query. Subqueries in the SELECT clause are often used to perform calculations or aggregations that would be cumbersome with a direct join.

info

Subqueries in SELECT and WHERE statements are analogous to JOINs

Example on Subqueries

Monarchs

Suppose we want to count the number of monarchs for each continent in a states table using data from a monarchs table. We can use a subquery to achieve this without explicitly joining the tables.

Start with the first filter, which is to get all unique continents from the prime_ministers table. This should print just the 7 continents, removing any duplicates.

SELECT DISTINCT continent
FROM prime_ministers;

Output:

continent
North America
Europe
Asia
South America
Oceania
Africa

For the next filtering, the output that we want to see should look like this:

continentmonarch_count

Let's now focus on the second column. To get the monarch_count:

SELECT COUNT(*) 
FROM monarchs
WHERE monarchs.continent = prime_ministers.continent

Combining both SQL statements into one:

SELECT 
DISTINCT continent, -- this is the first SELECT statement
(SELECT COUNT(*) -- this is the second SELECT statement, which is a subquery
FROM monarchs
WHERE monarchs.continent = prime_ministers.continent) AS monarch_count
FROM prime_ministers;

Explanation:

  • This query selects distinct continents from the prime_ministers table.
  • For each continent, a subquery counts the number of monarchs in the monarchs table.
  • The subquery uses a WHERE clause to match the continent fields between the two tables.
  • The result of the subquery is aliased as monarch_count to provide clarity in the output.

Output:

continentmonarch_count
Europe7
North America1
Oceania1
Asia1
Africa0

Populations

Below is the populations table with 20 records inside.

populations table
pop_idcountry_codeyearfertility_ratelife_expectancysize
20ABW20101.70474.95354101597
19ABW20151.64775.573586103889
2AFG20105.74658.9708327962208
1AFG20154.65360.7171732526562
12AGO20106.41650.6541721219954
11AGO20155.99652.66609625021974
4ALB20101.66377.036952913021
3ALB20151.79378.0144652889167
10AND20101.27null84419
9AND2015nullnull70473
409ARE20101.86876.675258329453
408ARE20151.76777.5412459156963
16ARG20102.3775.4849841222876
15ARG20152.30876.3342243416756
18ARM20101.64874.226342963496
17ARM20151.51774.797123017712
8ASM2010nullnull55636
7ASM2015nullnull55538
14ATG20102.1375.3087887233
13ATG20152.06376.1002291818

Goal: Figure out which countries had high average life expectancies in 2015.

Begin by calculating the average life expectancy from the populations table. Filter your answer to use records from 2015 only.

SELECT AVG(life_expectancy)
FROM populations
WHERE year = 2015;

Output:

avg
71.6763415481105

Next, nest the previous query into another query. Use this calculation to filter populations for all records where life_expectancy is 1.15 times higher than average.

SELECT *
FROM populations
WHERE life_expectancy > 1.15 *
(SELECT AVG(life_expectancy)
FROM populations
WHERE year = 2015)
AND year = 2015;

Output:

pop_idcountry_codeyearfertility_ratelife_expectancysize
21AUS20151.83382.4512223,789,752
376CHE20151.5483.197568,281,430
356ESP20151.3283.38048646,443,992
134FRA20152.0182.6707366,538,392
170HKG20151.19584.2780467,305,700
174ISL20151.9382.86098330,815
190ITA20151.3783.4902460,730,584
194JPN20151.4683.84366126,958,470
340SGP20151.2482.595125,535,002
374SWE20151.8882.5512169,799,186

Population in capital cities

Use both tables below:

Below is the populations table with 20 records inside.

cities table
NameCountry CodeCity Proper PopulationMetro Area PopulationUrban Area Population
AbidjanCIV4,765,000null4,765,000
Abu DhabiARE1,145,000null1,145,000
AbujaNGA1,235,8806,000,0001,235,880
AccraGHA2,070,4634,010,0542,070,463
Addis AbabaETH3,103,6734,567,8573,103,673
AhmedabadIND5,570,585null5,570,585
AlexandriaEGY4,616,625null4,616,625
AlgiersDZA3,415,8115,000,0003,415,811
AlmatyKAZ1,703,481null1,703,481
AnkaraTUR5,271,0004,585,0005,271,000
AucklandNZL1,495,0001,614,3001,495,000
BaghdadIRQ7,180,889null7,180,889
BakuAZE3,202,3004,308,7403,202,300
BandungIDN2,575,4786,965,6552,575,478
BangkokTHA8,280,92514,998,0008,280,925
BarcelonaESP1,604,5555,375,7741,604,555
BarranquillaCOL1,386,8652,370,7531,386,865
BasraIRQ2,750,000null2,750,000
BeijingCHN21,516,00024,900,00021,516,000
Belo HorizonteBRA2,502,5575,156,2172,502,557
countries table
CodeNameContinentRegionSurface AreaIndep YearLocal NameGov FormCapitalCapital LongitudeCapital Latitude
AFGAfghanistanAsiaSouthern and Central Asia652,0901919Afganistan/AfqanestanIslamic EmirateKabul69.176134.5228
NLDNetherlandsEuropeWestern Europe41,5261581NederlandConstitutional MonarchyAmsterdam4.8909552.3738
ALBAlbaniaEuropeSouthern Europe28,7481912ShqiperiaRepublicTirane19.817241.3317
DZAAlgeriaAfricaNorthern Africa2,381,7401962Al-Jaza’ir/AlgerieRepublicAlgiers3.0509736.7397
ASMAmerican SamoaOceaniaPolynesia199nullAmerika SamoaUS TerritoryPago Pago-170.691-14.2846
ANDAndorraEuropeSouthern Europe4681278AndorraParliamentary CoprincipalityAndorra la Vella1.521842.5075
AGOAngolaAfricaCentral Africa1,246,7001975AngolaRepublicLuanda13.242-8.81155
ATGAntigua and BarbudaNorth AmericaCaribbean4421981Antigua and BarbudaConstitutional MonarchySaint John's-61.845617.1175
AREUnited Arab EmiratesAsiaMiddle East83,6001971Al-Imarat al-´Arabiya al-MuttahidaEmirate FederationAbu Dhabi54.370524.4764
ARGArgentinaSouth AmericaSouth America2,780,4001816ArgentinaFederal RepublicBuenos Aires-58.4173-34.6118
ARMArmeniaAsiaMiddle East29,8001991HajastanRepublicYerevan44.50940.1596
ABWArubaNorth AmericaCaribbean193nullArubaNonmetropolitan Territory of The NetherlandsOranjestad-70.016712.5167
AUSAustraliaOceaniaAustralia and New Zealand7,741,2201901AustraliaConstitutional Monarchy, FederationCanberra149.129-35.282
AZEAzerbaijanAsiaMiddle East86,6001991AzarbaycanFederal RepublicBaku49.893240.3834
BHSBahamasNorth AmericaCaribbean13,8781973The BahamasConstitutional MonarchyNassau-77.33925.0661
BHRBahrainAsiaMiddle East6941971Al-BahraynMonarchy (Emirate)Manama50.535426.1921
BGDBangladeshAsiaSouthern and Central Asia143,9981971BangladeshRepublicDhaka90.411323.7055
BRBBarbadosNorth AmericaCaribbean4301966BarbadosConstitutional MonarchyBridgetown-59.610513.0935
BELBelgiumEuropeWestern Europe30,5181830Belgie/BelgiqueConstitutional Monarchy, FederationBrussels4.3676150.8371
BLZBelizeNorth AmericaCentral America22,6961981BelizeConstitutional MonarchyBelmopan-88.771317.2534

Return the name, country_code and urbanarea_pop for all capital cities (not aliased).

SELECT name, country_code, urbanarea_pop
FROM cities
WHERE name IN (
SELECT capital
FROM countries
)
ORDER BY urbanarea_pop DESC;

Output (some records not shown):

NameCountry CodeUrban Area Population
BeijingCHN21,516,000
DhakaBGD14,543,124
TokyoJPN13,513,734
MoscowRUS12,197,596
CairoEGY10,230,350
KinshasaCOD10,130,000
JakartaIDN10,075,310
SeoulKOR9,995,784
Mexico CityMEX8,974,724
LimaPER8,852,000
LondonGBR8,673,713
BangkokTHA8,280,925
TehranIRN8,154,051
BogotaCOL7,878,783
BaghdadIRQ7,180,889

Subqueries inside FROM

Subqueries can also be placed inside the FROM clause, where they act as temporary tables. This approach is beneficial when you need to perform operations on derived datasets or when combining data from multiple tables.

Syntax:

SELECT left_table.id, left_val 
FROM left_table, right_table
WHERE left_table.id = right_table.id

To drop the duplicates, we can use the distinct command:

SELECT DISTINCT left_table.id, left_val 
FROM left_table, right_table
WHERE left_table.id = right_table.id

Continents with Monarchs

Suppose we want to find all continents with monarchs, along with the most recent country to gain independence in each continent.

First, we need a query to find the most recent independence year for each continent.

SELECT continent, MAX(independence_year) AS most_recent
FROM countries
GROUP BY continent;

Explanation:

  • The query groups records by continent.
  • It uses the MAX() function to find the most recent independence_year for each continent.
  • The result provides the latest year of independence for each continent.

To filter this list for continents that have monarchs, we can include the subquery in the FROM clause.

SELECT DISTINCT monarchs.continent, sub.most_recent
FROM monarchs, (
SELECT
continent,
MAX(independence_year) AS most_recent
FROM countries
GROUP BY continent
) AS sub
WHERE sub.continent = monarchs.continent
ORDER BY sub.continent;

Explanation:

  • Subquery: The subquery in the FROM clause generates a temporary table with the most recent independence years, aliased as sub.
  • Joining Tables: Both sub and monarchs are included in the FROM clause, separated by a comma, which allows us to reference both datasets.
  • Filtering: The WHERE clause ensures we only include continents that have monarchs by matching sub.continent with monarchs.continent.
  • Distinct Records: The DISTINCT keyword eliminates duplicate records in the result set.
  • Ordering: The results are ordered by continent to improve readability.

Output:

Continentmost_recent
Asia1991
Europe1993

Common Table Expressions (CTEs)

CTEs create temporary, reusable result sets using the WITH keyword. They are ideal for large or complex datasets.

  • Alternative to using JOINs.
  • Creates a temporary table and gets executed only once.
  • Improve readability by isolating query logic.
  • Reduce redundant computations for better performance.

Example Use of CTEs:

  • Aggregating word lengths in the English language using a CTE.
  • Final query compares English word counts to movie script word counts.
  • CTE improves speed by pre-aggregating data before joining.

When to Use:

  • Use subqueries for simple calculations or dynamic filtering.
  • Use CTEs for complex logic or large datasets needing optimization.

Climate Data for Southern Hemisphere Olympic Regions

Countries like Canada, Russia, and Mongolia have freezing temperatures year-round, while others experience winter cold for a few months, still supporting training for winter sports like skiing and bobsledding. Below are the sample datasets:

Examine temperature data for Olympic countries and focus on the southern hemisphere, which sees lower Winter Olympics participation.

  • Write a CTE for the southern hemisphere.
  • Find the average June temperature and precipitation.
  • Join the data to see winter month temperatures for all regions.

Solution:

The CTE Calculates the average June temperature and precipitation for the specified regions. The JOIN Combines this data with athlete info for the winter season Finally, the GROUP and ORDER Groups results by region and averages, ordering by temperature.

WITH south_cte AS (
SELECT region,
ROUND(AVG(temp_06), 2) AS avg_winter_temp,
ROUND(AVG(precip_06), 2) AS avg_winter_precip
FROM oclimate
WHERE region IN ('Africa', 'South America', 'Australia and Oceania')
GROUP BY region
)

SELECT south.region,
south.avg_winter_temp,
south.avg_winter_precip,
COUNT(DISTINCT ath.athlete_id)
FROM south_cte AS south
INNER JOIN athletes_recent ath
ON south.region = ath.region
AND ath.season = 'Winter'
GROUP BY south.region, south.avg_winter_temp, south.avg_winter_precip
ORDER BY south.avg_winter_temp;

Sample output:

RegionAvg Winter Temp (°C)Avg Winter Precip (mm)Count
South America18.50156.2331
Australia and Oceania19.87137.9276
Africa23.9973.985