Skip to main content

Summarizing Data

Updated Aug 23, 2019 ·

Sample Table

We'll use the films table for the examples:

idtitlerelease_yearcountrydurationlanguagecertificationgrossbudget
1Inception2010USA148EnglishPG-13829895144160000000
2Parasite2019South Korea132KoreanR25759015211400000
3The Godfather1972USA175EnglishR2461209866000000
4Amélie2001France122FrenchR17420000010000000
5Coco2017USA105SpanishPG807082196175000000

Aggregate Functions

There are aggregate functions that allow us to perform calculations on a set of values. These functions include finding the average, sum, minimum, and maximum of a specified field. Aggregate functions follow the SELECT keyword, similar to COUNT.

  • Average: Finds the mean value.
  • Sum: Adds up all the values.
  • Minimum: Finds the smallest value.
  • Maximum: Finds the largest value.

Examples for Aggregate Functions

  • To find the average budget of the films in the table:

    SELECT AVG(budget) AS average_budget
    FROM films;

    Output:

    average_budget
    60000000
  • To sum up the total gross revenue from all films:

    SELECT SUM(gross) AS total_gross
    FROM films;

    Output:

    total_gross
    1310799482

MIN and MAX Functions

The MIN function returns the lowest value, and the MAX function returns the highest value in a specified field.

Example: To determine the minimum and maximum budgets:

SELECT MIN(budget) AS minimum_budget, MAX(budget) AS maximum_budget
FROM films;

Output:

minimum_budgetmaximum_budget
10000000175000000

Non-Numerical Data

Although some aggregate functions are mathematical, many can be used with non-numerical fields. COUNT, MIN, and MAX work well with strings and dates.

To discover the film titles that come first and last alphabetically:

SELECT MIN(title) AS first_title, MAX(title) AS last_title
FROM films;

Output:

first_titlelast_title
AmélieThe Godfather

Aliasing for Clarity

When using aggregate functions, it’s best practice to alias your results for clarity. This helps make your code more readable.

SELECT MIN(release_year) AS earliest_release_year, MAX(release_year) AS latest_release_year
FROM films;

Output:

earliest_release_yearlatest_release_year
19722019

Combining WHERE with Aggregate Functions

Using the WHERE clause with aggregate functions helps refine data analysis by applying conditions before aggregation. The WHERE clause filters records, while aggregate functions summarize the filtered results.

Examples for Aggregate + WHERE

  • Average Budget for Movies Released in 2010 or Later

    SELECT AVG(budget) AS avg_budget_2010_or_later
    FROM films
    WHERE release_year >= 2010;

    Output:

    avg_budget_2010_or_later
    85400000
  • Total Budget of Movies Released in 2010

    SELECT SUM(budget) AS total_budget_2010
    FROM films
    WHERE release_year = 2010;

    Output:

    total_budget_2010
    160000000
  • Smallest Budget of Movies Released in 2010

    SELECT MIN(budget) AS min_budget_2010
    FROM films
    WHERE release_year = 2010;

    Output:

    min_budget_2010
    160000000
  • Highest Budget and Count of Movies Released in 2010

    SELECT MAX(budget) AS max_budget_2010,
    COUNT(budget) AS count_budgets_2010
    FROM films
    WHERE release_year = 2010;

    Output:

    max_budget_2010count_budgets_2010
    1600000001

Using ROUND for Precision

The ROUND() function is used to round numerical values to a specified number of decimal places. This is particularly useful for financial data where precision is required.

  • Average Budget Rounded to Two Decimal Places

    SELECT ROUND(AVG(budget), 2) AS avg_budget_rounded
    FROM films
    WHERE release_year >= 2010;

    Output:

    avg_budget_rounded
    85400000.00
  • Budget Rounded to Whole Numbers

    SELECT ROUND(AVG(budget)) AS avg_budget_whole
    FROM films
    WHERE release_year >= 2010;

    Output:

    avg_budget_whole
    85400000
  • Rounding to the Hundred Thousand

    SELECT ROUND(AVG(budget), -5) AS avg_budget_hundred_thousand
    FROM films
    WHERE release_year >= 2010;

    Output:

    avg_budget_hundred_thousand
    85000000