Skip to main content

Normal Forms

Updated Oct 15, 2019 ·

Overview

Normal forms are used in normalizing a database, which involves organizing the data to reduce redundancy and improve integrity. Each normal form builds on the previous one, introducing stricter rules for how data should be structured. By following these rules, databases become easier to maintain and less prone to errors, such as update anomalies or data inconsistencies.

To learn more about normalization, please see Schemas and Normalization.

Types of Normal Forms

First Normal Form (1NF)

First Normal Form ensures that each column contains atomic values and that there are no repeating groups of data. This means every entry in a column is indivisible, and each record must be unique.

Consider a table of students and their courses before 1NF:

Student_idNameCourses
101AliceMath, English
102BobScience
103CarolMath, History

To convert this to 1NF, ensure that each course is in a separate row, splitting it into two tables:

Student Table

Student_idName
101Alice
102Bob
103Carol

Student-Course Table

Student_idCourse
101Math
101English
102Science
103Math
103History

Second Normal Form (2NF)

Second Normal Form builds on 1NF by removing partial dependencies, meaning all non-key attributes must be fully dependent on the primary key. In a table with a composite primary key, every non-key attribute should depend on the whole key, not just part of it.

  • Must satisfy 1NF First
  • If primary key is one column, then automatically satisfies 2NF
  • If there is a composite primary key, then each non-key column must be dependent on all the keys

Consider a table that tracks student progress in various courses:

Student_idCourse_idInstructor_idInstructorProgress
1012021301Dr. Smith85%
1012042302Dr. Jones90%
1022021301Dr. Smith75%
1032053303Dr. Brown88%

The primary keys are:

  • Student_id
  • Course_id

The non-key columns:

  • Instructor

    • Dependent only on Course_id
    • Instructors depends on the courses, not the students taking the course.
  • Instructor_id

    • Similar with Instructor in terms of dependency.
    • Dependent only on Course_id
    • Instructors depends on the courses, not the students taking the course.
  • Progress

    • Dependent on both Student_id and Course_id

To achieve 2NF, separate the data into two tables:

Student-Course Table

Student_idCourse_idProgress
101202185%
101204290%
102202175%
103205388%

Instructor-Course Table

Course_idInstructor_idInstructor
2021301Dr. Smith
2042302Dr. Jones
2053303Dr. Brown

In these tables:

  • Student-Course Table:

    • The Progress is fully dependent on the composite key (Student_id, Course_id).
  • Instructor-Course Table:

    • Course_id is the primary key here.
    • Instructor and Instructor_id is fully dependent on the Course_id.
    • This ensures there are no partial dependencies.

This separation resolves partial dependencies, ensuring that each non-key attribute is related to the entire primary key in its respective table, thus achieving 2NF.

Third Normal Form (3NF)

Third Normal Form eliminates transitive dependencies, ensuring non-key attributes are not dependent on other non-key attributes.

  • Must satisfy 2NF first
  • Non-key columns should not depend on other non-key columns.

In the sample table below, the primary key is the Course ID.

  • Instructor is dependent on Instructor ID
  • This creates a transitive dependency since Instructor can be derived from Instructor ID instead of directly from Course ID.
Course IDInstructor IDInstructorTechnology
2021301Dr. SmithPython
2042302Dr. JonesJava
2053303Dr. BrownJavaScript
2064304Dr. WhiteC++
2075305Dr. GreenRuby

To achieve 3NF, we separate the table into two tables to eliminate transitive dependencies:

Courses Table

Course IDInstructorTechnology
2021Dr. SmithPython
2042Dr. JonesJava
2053Dr. BrownJavaScript
2064Dr. WhiteC++
2075Dr. GreenRuby

Instructors Table

Instructor IDInstructor
301Dr. Smith
302Dr. Jones
303Dr. Brown
304Dr. White
305Dr. Green

In these tables:

  • Courses Table:

    • Attributes Instructor and Technology are now directly related to Course ID without depending on any other non-key attribute.
  • Instructors Table:

    • The instructor information is separated into its own table.
    • This removes the transitive dependency on Instructor ID.

This structure ensures that all non-key attributes in the Courses table are directly dependent on the primary key, Course ID, and eliminates transitive dependencies.

Other Normal Forms

The first three normal forms are commonly used in normalization, but there are other normal forms as well.

Boyce-Codd Normal Form (BCNF)

BCNF is a stricter version of 3NF, ensuring every determinant is a candidate key.

Consider a table where a professor can only teach one subject in a particular room, but a room can host different professors at different times:

ProfessorSubjectRoom
Dr. SmithMath101
Dr. JonesEnglish102
Dr. SmithScience101

This table violates BCNF because Room determines the Subject. To achieve BCNF:

Room-Subject Table

RoomSubject
101Math
102English

Professor-Assignment Table

ProfessorRoom
Dr. Smith101
Dr. Jones102

Fourth Normal Form (4NF)

Fourth Normal Form eliminates multi-valued dependencies, ensuring that there are no non-trivial multi-valued dependencies other than a candidate key.

Suppose a student can have multiple hobbies and skills independently:

Student_idHobbySkill
101CyclingPython
101PaintingSQL
102CyclingJava
102DrawingPython

This table has a multi-valued dependency because hobbies and skills are independent attributes. To achieve 4NF, separate these into two tables:

Hobby Table

Student_idHobby
101Cycling
101Painting
102Cycling
102Drawing

Skill Table

Student_idSkill
101Python
101SQL
102Java
102Python

Fifth Normal Form (5NF)

Fifth Normal Form focuses on eliminating join dependencies, ensuring that all join dependencies are implied by candidate keys.

Suppose a project requires multiple employees with different skills, and each skill needs specific tools:

ProjectEmployeeSkillTool
AAliceCodingLaptop
ABobTestingToolkit
BAliceTestingToolkit
BCarolDesignSketchpad

To achieve 5NF, decompose this table:

Project-Employee Table

ProjectEmployee
AAlice
ABob
BAlice
BCarol

Employee-Skill Table

EmployeeSkill
AliceCoding
BobTesting
CarolDesign

Skill-Tool Table

SkillTool
CodingLaptop
TestingToolkit
DesignSketchpad

More examples

1NF

Below is a denormalized table called customers containing car rental records:

customer_idcustomer_namecars_rentedinvoice_idpremium_membersalutation
2871Alex Johnson7XY1235498falseDr
3924Mia Chen3TH6787623falseMr
4785Noah Wilson9JK234, 8LM567, 1BC8901839, 9462, 2750trueMs
5648Emma Garcia2DE345, 6GH7893847, 5129trueMrs

To-dos:

  1. cars_rented holds one or more car_ids and invoice_id holds multiple values.
  2. Create a new table to hold individual car_ids and invoice_ids of the customer_ids who've rented those cars.
  3. Drop two columns from customers table to satisfy 1NF
Solution

Run the SQL commands below:

-- Create a new table to hold the cars rented by customers
-- Creating a Customer table schema
CREATE TABLE Customer (
CustomerID INT PRIMARY KEY,
FirstName VARCHAR(50) NOT NULL,
LastName VARCHAR(50),
Email VARCHAR(100) UNIQUE,
Phone VARCHAR(15)
);

-- Drop two columns from customers table to satisfy 1NF
ALTER TABLE customers
DROP COLUMN cars_rented,
DROP COLUMN invoice_id;

SELECT * FROM customer_rentals;

We now have two tables:

  • customers which holds customer information
  • customer_rentals which holds the car_ids rented by different customer_ids.

This satisfies 1NF. In a real situation, we would need to fill the new table before dropping any columns.

2NF

Below is an expanded version of the customer_rentals table:

customer_idcar_idstart_dateend_datemodelmanufacturertype_carconditioncolor
32107XM2452020-05-102020-05-15Focus 2020Fordsedangoodwhite
45218QN3652021-06-122021-06-18Mustang 2021Fordconvertibleexcellentblack
38276LB4572021-07-192021-07-21RAV4 2021ToyotaSUVexcellentsilver
38276LB4572021-08-012021-08-12RAV4 2021ToyotaSUVexcellentsilver
38275ZX6732020-09-052020-09-10RAV4 2021ToyotaSUVgoodblack
49387XM2452020-11-112020-11-15Focus 2020Fordsedangoodwhite
49387XM2452021-01-042021-01-10Focus 2020Fordsedanfairwhite

To-dos:

  1. Create a new table for the non-key columns that were conflicting with 2NF criteria.
  2. Drop those non-key columns from customer_rentals.
Solution

We can use the query below to check the primary keys in the given customer_rentals table:

SELECT 
kcu.column_name,
tc.constraint_type
FROM
information_schema.table_constraints AS tc
JOIN
information_schema.key_column_usage AS kcu
ON
tc.constraint_name = kcu.constraint_name
WHERE
tc.table_name = 'customer_rentals'
AND tc.constraint_type = 'PRIMARY KEY';

Based on the output above, the following columns are the primary keys:

  • customer_id
  • start_date
  • car_id

The following non-key columns depend on car_id, but are independent of the other two primary keys:

  • model
  • manufacturer
  • type_car
  • conditions
  • colors

The customer or start date cannot change these attributes. We can put these columns in a new table and dropped them from customer_rentals table.

Create the table to store details about each car:

CREATE TABLE cars (
car_id VARCHAR(256) NULL,
model VARCHAR(128),
manufacturer VARCHAR(128),
type_car VARCHAR(128),
condition VARCHAR(128),
color VARCHAR(128)
);

Modify the customer_rentals table by dropping the columns that were moved to the cars table. This will ensure the table satisfies 2NF by eliminating partial dependencies.

ALTER TABLE customer_rentals
DROP COLUMN model,
DROP COLUMN manufacturer,
DROP COLUMN type_car,
DROP COLUMN condition,
DROP COLUMN color;

3NF

We have expanded the cars table from above and created the rental_cars table below which uses the car_id attributes as the primary key.

car_idmodelmanufacturertype_carconditioncolor
6XZ123Focus 2020Fordsedanexcellentwhite
7PQ789Accord 2019Hondasedangoodsilver
8LM456Model S 2021Teslaelectricnewblack
9AB654Mustang 2020Fordconvertiblefairred

To-dos:

  1. Create a new table for the non-key columns that were conflicting with 3NF criteria.
  2. Drop those non-key columns from rental_cars.
Solution

Confirm the primary keys first:

SELECT 
kcu.column_name,
tc.constraint_type
FROM
information_schema.table_constraints AS tc
JOIN
information_schema.key_column_usage AS kcu
ON
tc.constraint_name = kcu.constraint_name
WHERE
tc.table_name = 'rental_cars'
AND tc.constraint_type = 'PRIMARY KEY';

Create a new table to satisfy 3NF:

CREATE TABLE car_model(
model VARCHAR(128),
manufacturer VARCHAR(128),
type_car VARCHAR(128)
);

Drop columns in rental_cars to satisfy 3NF:

ALTER TABLE rental_cars
DROP COLUMN condition,
DROP COLUMN color;

SELECT * FROM rental_cars;

From here we can see that creating 3NF tables help reduce data redundancy and potential data anomalies.