Skip to main content

Keys and Superkeys

Updated Oct 05, 2019 ·

Overview

Keys in a database are attributes or combinations of attributes that uniquely identify records. While the entire table can act as a superkey, a true key is a minimal superkey, which means it cannot have any attributes removed without losing its unique identification ability. This minimal superkey is what we refer to as a "key."

  • Keys uniquely identify records.
  • Superkeys can have redundant attributes.
  • Minimal superkeys are the true keys.

Example: Keys

Below table listing six different cars. All the table's attributes combined form a superkey.

License NoSerial NoModelMakeYear
AB123CDESN1234567890CivicHonda2020
CD456EFGSN0987654321AccordHonda2019
EF789GHISN5678901234Model STesla2021
GH012JKLSN3456789012MustangFord2018
IJ345MNOSN2345678901CorollaToyota2017
IJ345MNOSN2546464789CorollaToyota2021
KL678PQRSN6789012345CamryToyota2022
MN901STUSN8901234567ExplorerFord2019
OP234VWXSN9012345678AltimaNissan2020
QR567YZASN0123456789RogueNissan2021
ST890BCDSN3450123456A4Audi2018

If remove the year attribute, the records are still unique thus it still a superkey. There are other potential superkeys, each capable of uniquely identifying records. Among the various superkeys, there are four minimal superkeys:

  • license_no
  • serial_no
  • model
  • Combination of make and year

These minimal superkeys, also known as candidate keys, are sets of attributes that can uniquely identify each record in the table. None of these attributes can be removed without losing their uniqueness. Eventually one of these candidate keys will be selected as the primary key for the table.

Identifying Keys

There's a very basic way of finding out what qualifies for a key in an existing, populated table:

  1. Count the distinct records for all possible combinations of columns. If the resulting number x equals the number of all rows in the table for a combination, you have discovered a superkey.

  2. Then remove one column after another until you can no longer remove columns without seeing the number x decrease. If that is the case, you have discovered a (candidate) key.

Example: Identifying Keys

We'll use the languages table. The dataset can be downloaded from my Github repository.

A preview of the table:

SELECT * 
FROM languages
LIMIT 10;
lang_idcodenamepercentofficial
1AFGDari50True
2AFGPashto35True
3AFGTurkic11False
4AFGOther4False
5ALBAlbanian98.8True
6ALBGreek0.5False
7ALBOther0.6False
8ALBunspecified0.1False
9DZAArabicNULLTrue
10DZAFrenchNULLFalse

The steps are below:

  1. Find the number of records. We will use this value as the reference. Any combinations that return "995" is considered a superkey.

    SELECT COUNT(*)
    FROM languages;

    Output:

    count
    955
  2. Start by checking distinct combinations of all columns.

    SELECT COUNT(DISTINCT(lang_id, code, name, percent, official))
    FROM languages;

    Output:

    count
    955

    This combination is a superkey

  3. Check distinct combinations of 4 columns.

    SELECT 
    COUNT(DISTINCT(lang_id, code, name, percent)),
    COUNT(DISTINCT(lang_id, code, name, official)),
    COUNT(DISTINCT(lang_id, code, percent, official)),
    COUNT(DISTINCT(lang_id, name, percent, official)),
    COUNT(DISTINCT(code, name, percent, official))
    FROM languages;

    Output:

    countcountcountcountcount
    955955955955955

    These combinations are superkeys.

  4. Check distinct combinations of 3 columns.

    SELECT 
    COUNT(DISTINCT(lang_id, code, name)),
    COUNT(DISTINCT(lang_id, code, percent)),
    COUNT(DISTINCT(lang_id, code, official)),
    COUNT(DISTINCT(lang_id, name, percent)),
    COUNT(DISTINCT(lang_id, name, official)),
    COUNT(DISTINCT(lang_id, percent, official)),
    COUNT(DISTINCT(code, name, percent)),
    COUNT(DISTINCT(code, name, official)),
    COUNT(DISTINCT(code, percent, official)),
    COUNT(DISTINCT(name, percent, official))
    FROM languages;

    Output:

    countcountcountcountcountcountcountcountcountcount
    695695695695695695695695695695

    These combinations are NOT superkeys

  5. Check distinct combinations of 2 columns.

    SELECT 
    COUNT(DISTINCT(lang_id, code)),
    COUNT(DISTINCT(lang_id, name)),
    COUNT(DISTINCT(lang_id, percent)),
    COUNT(DISTINCT(lang_id, official)),
    COUNT(DISTINCT(code, name)),
    COUNT(DISTINCT(code, percent)),
    COUNT(DISTINCT(code, official)),
    COUNT(DISTINCT(name, percent)),
    COUNT(DISTINCT(name, official)),
    COUNT(DISTINCT(percent, official))
    FROM world.languages;

    Output:

    countcountcountcountcountcountcountcountcountcount
    273273273273273273273273273273

    These combinations are NOT superkeys

  6. Check distinct combinations of single columns.

    SELECT 
    COUNT(DISTINCT(lang_id)),
    COUNT(DISTINCT(code)),
    COUNT(DISTINCT(name)),
    COUNT(DISTINCT(percent)),
    COUNT(DISTINCT(official))
    FROM world.languages;

    Output:

    countcountcountcountcount
    22222

    Since these combinations returned the smallest, they're not superkeys and also not considered as candidate keys.

Based on the outputs, the following combinations are superkeys:

  • lang_id, code, name, percent, official
  • lang_id, code, name, percent
  • lang_id, code, name, official
  • lang_id, code, percent, official
  • lang_id, name, percent, official
  • code, name, percent, official

To find the candidate keys, we need to minimize the columns while still maintaining the unique identification of each row. The smallest combination from the superkeys list would be the candidate key. Since the distinct count dropped for three columns combinations, we can conclude that removing any one column from the four-column superkeys results in losing the uniqueness.

Thus, the candidate keys are the combinations of any four columns out of the five columns listed in the superkeys:

  • lang_id, code, name, percent
  • lang_id, code, name, official
  • lang_id, code, percent, official
  • lang_id, name, percent, official
  • code, name, percent, official