Summarizing Data
Sample Table
We'll use the films table for the examples:
id | title | release_year | country | duration | language | certification | gross | budget |
---|---|---|---|---|---|---|---|---|
1 | Inception | 2010 | USA | 148 | English | PG-13 | 829895144 | 160000000 |
2 | Parasite | 2019 | South Korea | 132 | Korean | R | 257590152 | 11400000 |
3 | The Godfather | 1972 | USA | 175 | English | R | 246120986 | 6000000 |
4 | Amélie | 2001 | France | 122 | French | R | 174200000 | 10000000 |
5 | Coco | 2017 | USA | 105 | Spanish | PG | 807082196 | 175000000 |
Aggregate Functions
There are aggregate functions that allow us to perform calculations on a set of values. These functions include finding the average, sum, minimum, and maximum of a specified field. Aggregate functions follow the SELECT
keyword, similar to COUNT
.
- Average: Finds the mean value.
- Sum: Adds up all the values.
- Minimum: Finds the smallest value.
- Maximum: Finds the largest value.
Examples for Aggregate Functions
-
To find the average budget of the films in the table:
SELECT AVG(budget) AS average_budget
FROM films;Output:
average_budget 60000000 -
To sum up the total gross revenue from all films:
SELECT SUM(gross) AS total_gross
FROM films;Output:
total_gross 1310799482
MIN
and MAX
Functions
The MIN
function returns the lowest value, and the MAX
function returns the highest value in a specified field.
Example: To determine the minimum and maximum budgets:
SELECT MIN(budget) AS minimum_budget, MAX(budget) AS maximum_budget
FROM films;
Output:
minimum_budget | maximum_budget |
---|---|
10000000 | 175000000 |
Non-Numerical Data
Although some aggregate functions are mathematical, many can be used with non-numerical fields. COUNT, MIN, and MAX work well with strings and dates.
To discover the film titles that come first and last alphabetically:
SELECT MIN(title) AS first_title, MAX(title) AS last_title
FROM films;
Output:
first_title | last_title |
---|---|
Amélie | The Godfather |
Aliasing for Clarity
When using aggregate functions, it’s best practice to alias your results for clarity. This helps make your code more readable.
SELECT MIN(release_year) AS earliest_release_year, MAX(release_year) AS latest_release_year
FROM films;
Output:
earliest_release_year | latest_release_year |
---|---|
1972 | 2019 |
Combining WHERE
with Aggregate Functions
Using the WHERE
clause with aggregate functions helps refine data analysis by applying conditions before aggregation. The WHERE
clause filters records, while aggregate functions summarize the filtered results.
Examples for Aggregate + WHERE
-
Average Budget for Movies Released in 2010 or Later
SELECT AVG(budget) AS avg_budget_2010_or_later
FROM films
WHERE release_year >= 2010;Output:
avg_budget_2010_or_later 85400000 -
Total Budget of Movies Released in 2010
SELECT SUM(budget) AS total_budget_2010
FROM films
WHERE release_year = 2010;Output:
total_budget_2010 160000000 -
Smallest Budget of Movies Released in 2010
SELECT MIN(budget) AS min_budget_2010
FROM films
WHERE release_year = 2010;Output:
min_budget_2010 160000000 -
Highest Budget and Count of Movies Released in 2010
SELECT MAX(budget) AS max_budget_2010,
COUNT(budget) AS count_budgets_2010
FROM films
WHERE release_year = 2010;Output:
max_budget_2010 count_budgets_2010 160000000 1
Using ROUND for Precision
The ROUND()
function is used to round numerical values to a specified number of decimal places. This is particularly useful for financial data where precision is required.
-
Average Budget Rounded to Two Decimal Places
SELECT ROUND(AVG(budget), 2) AS avg_budget_rounded
FROM films
WHERE release_year >= 2010;Output:
avg_budget_rounded 85400000.00 -
Budget Rounded to Whole Numbers
SELECT ROUND(AVG(budget)) AS avg_budget_whole
FROM films
WHERE release_year >= 2010;Output:
avg_budget_whole 85400000 -
Rounding to the Hundred Thousand
SELECT ROUND(AVG(budget), -5) AS avg_budget_hundred_thousand
FROM films
WHERE release_year >= 2010;Output:
avg_budget_hundred_thousand 85000000