Skip to main content

Grouping

Updated Aug 23, 2019 ·

Sample Table

We'll use a films table for the examples.

idtitlerelease_yearcountrydurationlanguagecertificationgrossbudget
1Inception2010USA148EnglishPG-13829895144160000000
2Parasite2019South Korea132KoreanR25759015211400000
3The Godfather1972USA175EnglishR2461209866000000
4Amélie2001France122FrenchR17420000010000000
5Coco2017USA105SpanishPG807082196175000000

Grouping Data

In practice, we often need to summarize data by grouping it. For instance, we might want to group film data by certification and calculate summary statistics, such as the average duration for each certification.

GROUP BY Single Fields

SQL uses the GROUP BY clause to group data based on one or more fields. This is often combined with aggregate functions to get summary statistics for each group.

To find the average duration of films for each certification:

SELECT certification, AVG(duration) AS avg_duration
FROM films
GROUP BY certification;

Output:

certificationavg_duration
PG105
PG-13148
R149

Error Handling

When using GROUP BY, all fields in the SELECT statement that are not aggregated must be included in the GROUP BY clause. If a field is selected but not grouped or aggregated, SQL will return an error.

Example:

SELECT certification, title
FROM films
GROUP BY certification;

Error Message: Field title is not aggregated or included in the GROUP BY clause.

Corrected Query:

SELECT certification, COUNT(title) AS film_count
FROM films
GROUP BY certification;

GROUP BY Multiple Fields

GROUP BY can be applied to multiple fields, which affects how the data is grouped. The order of fields in the GROUP BY clause determines the grouping hierarchy.

To see the number of films by certification and language:

SELECT certification, language, COUNT(title) AS film_count
FROM films
GROUP BY certification, language;

Output:

certificationlanguagefilm_count
PGSpanish1
PG-13English1
REnglish2
RKorean1
RFrench1

GROUP BY with ORDER BY

Combining GROUP BY with ORDER BY allows you to group data, perform calculations, and then sort the results.

  • To group and sort by film count:

    SELECT certification, COUNT(title) AS film_count
    FROM films
    GROUP BY certification
    ORDER BY film_count DESC;

    Output:

    certificationfilm_count
    R3
    PG-131
    PG1

    Explanation: The ORDER BY clause is used after GROUP BY to sort the results based on the aggregated data. The query shows that there are more films with an R rating compared to other certifications.

  • Select the release_year, country, and the maximum budget aliased as max_budget for each year and each country; sort your results by release_year and country.

    SELECT release_year, country, MAX(budget) AS max_budget
    FROM films
    GROUP BY release_year, country
    ORDER BY release_year, country;

    Output:

    release_yearcountrymax_budget
    1972USA6000000
    2001France10000000
    2010USA160000000
    2017USA175000000
    2019South Korea11400000

Order of Execution

In SQL, the GROUP BY clause is processed after the FROM clause and before ORDER BY. The typical order of execution is:

  1. FROM (to determine the data source)
  2. GROUP BY (to group the data)
  3. SELECT (to select fields and perform aggregations)
  4. ORDER BY (to sort the results)
  5. LIMIT (to restrict the number of results)