Skip to main content

Row vs. Column Store

Updated Nov 30, 2021 ·

Overview

The type of data store used affects the speed and efficiency of database queries, particularly for transactional or analytical tasks. Column stores are best suited for these analytical queries, while row stores are better for transactional workloads.

Basics of Computer Storage

Computers store data in blocks on the hard drive.

  • Data can span multiple blocks.
  • Accessing data stored in fewer blocks is faster.

For analytical workloads, storing data in fewer blocks improves query speed.

Row Store

In a row store, data for each row is stored together.

  • Good for transactional workloads.
  • New data is added quickly by writing to a new block.
  • Not ideal for analytical querie.

Example: Health Table containing flu infection data.

YearAge GroupHospitalization Percentage
201918-4913.9
201950-6422.5
201965+63.57
...
202018-4918.1
...
202118-4915.6

If we want to calculate the average hospitalization percentage for a specific year (2019, 2020, or 2021), the system has to read multiple blocks to retrieve the data.

If we need the average hospitalization percentage for 2019, the system reads all blocks related to 2019, which takes longer.

Column Store

In a column store, data for each column is stored together.

  • Best for analytical workloads.
  • Only the necessary columns need to be read, makes queries faster.
  • All data in a column is of the same type, it's stored more efficiently.

If we want the average hospitalization percentage for 2019, the system only reads the "Hospitalization %" column, which is quicker.

Comparison

Row Store

  • Data is stored together by row
  • Ideal for transactional workloads
  • Slower for analytical queries

Column Store

  • Data is stored by column
  • Best for analytical workloads
  • Faster for querying large amounts of data
  • Better for compression