Amazon Athena

Updated Jul 26, 2020 ·

NOTES

This is not an exhaustive documentation of all the existing AWS Services. These are summarized notes that I used for the AWS Certifications.

To see the complete documentation, please go to: AWS documentation

Overview

Athena is a serverless interactive query service which makes it easy to search and analyze data in AWS S3 using SQL.

The source data is stored on S3 and Athena can read from this data. In Athena you are defining a way to get the original data and defining how it should show up for what you want to see.

Serverless service to perform analytics directly against S3 files
Uses schema-on-read, the original data is never changed and remains in its original form.
The schema which you define in advance, modifies data in flight when its read.
It provides a JDBC/ODBC driver

How Athena works

Tables are defined in advance in a data catalog and data is projected when read.
It allows SQL-like queries on data without transforming the data itself.
This can be saved in the console or fed to other visualization tools.
Original data setis optimized which reduces the amount of space used for the data and the costs for querying that data.

Pricing

We are charged per query amount of data scanned, we are billed for what are we using.

Supported file formats

csv, json, orc, Avro, Parquet.
In the back-end it uses Presto query engine

Uses cases

Business intelligence
Analytics
Reporting
Log analysis

How to get started

Create an S3 bucket with data in a supported format.
Create an Athena database.
Create an Athen external table pointing to the S3 bucket.
Search data in Athena using normal SQL (Select).

Overview​

How Athena works​

Pricing​

Supported file formats​

Uses cases​

How to get started​