AWS Databases

Updated Jul 26, 2020 ·

NOTES

This is not an exhaustive documentation of all the existing AWS Services. These are summarized notes that I used for the AWS Certifications.

To see the complete documentation, please go to: AWS documentation

Choosing the Right Database

The following are some of the guideline questions to ask when choosing the database to use:

Is the workload read-heavy, write-heavy or balanced?
What are the throughput needs?
Will the throughput change, fluctuate or will we have to scale the DB over time?
How much data do we store and for how long?
Will the size of the data grow?
What is the average size of an object in the DB? How frequently are these objects accessed?
Should the data be stored for a week or forever?
Is the database going to be a source of truth?
What are the latency concerns?
What is the data model?
How will the data be queried?
Will the data be structured, semi-structured or unstructured?
Do we need strong schema or flexible schema?
Do we need reporting? Do we need advanced search?
What are the license costs?
Can we switch to a cloud native database such as Aurora, DynamoDB, etc?

Database Types on AWS

Database Types	AWS Service	Description
RDBMS (SQL/OLTP)	RDS Aurora	Great for joins and normalized data
NoSQL	DynamoDB ElastiCache (key/value pairs) Neptune (graphs) DocumentDB (json)	No joins, no SQL
Object Store	S3 (for big objects) Glacier (for backups, archive)	For object storage and archival
Data Warehouse	Redshift (OLAP) Athena	SQL Analytics / BI Use cases
Search	ElasticSearch (JSON)	Free text, unstructured searches
Graphs	Neptune	Displays relationships between data

Amazon Relational Databases (RDS)

Amazon RDS is a managed database service for relational databases

Provides cost-efficient, resizable capacity
AWS manages common database administration tasks.
AWS provisions an EC2 instance behind the scenes and EBS Volume

Aurora

Amazon Aurora (Aurora) is a fully managed relational database engine that's compatible with MySQL and PostgreSQL.

Data is held in 6 replicas across 3 AZs
Up to 15 read replicas (only 5 for RDS)
It has some auto healing capabilities
Auto scaling read replicas which can be global
Can be global, for disaster recovery purposes or latency purposes
Auto scaling of storage form 10GB to 64TB
We define an EC2 instance for Aurora instances
Same security, monitoring and maintenance features as RDS
Aurora has a serverless option

Use cases:

Similar to RDS, but with less maintenance, more flexibly, more performant at a higher cost

ElastiCache

Managed Redis or Memcached which provides a high performance, resizable, and cost-effective in-memory cache, while removing complexity associated with deploying and managing a distributed cache environment.

In-memory data store, sub-millisecond latency
Must provision and EC2 instance type
Offers support for clustering (Redis)
Multi AZ
Read replicas (sharding)
Security through IAM, SG, KMS, Redis Auth
Backups, snapshots and point in time restore (for Redis)
It has managed and scheduled maintenance
Monitoring happens through CloudWatch

Uses cases

Key/value store
Frequent reads, less writes
Cache results for DB queries
Store session data for websites

DynamoDB

Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance.

Managed NoSQL database, proprietary to AWS
Serverless, provisioned capacity
Supports auto scaling, provides on demand capacity as well
Can replace ElastiCache as a key/value store
Highly available, Multi AZ by default\
Reads and writes are decoupled, DAX for read cache
Reads can be eventually consistent or strongly consistent
Security, authentication and authorization is done through IAM
Enable DynamoDB Streams to stream all the changes in DynamoDB and integrate with AWS Lambda
Provides backup/restore features, Global Table feature
Monitoring is done through CloudWatch
The tables can only be queried on primary key, sort key or indexes

Use cases:

Serverless application development
Distributed serverless cache

Redshift

Amazon Redshift is a fast, fully managed, petabyte-scale data warehouse service that is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions.

Redshift is based on PostgreSQL, but it's not used for OLTP
It's OLAP - only analytical processing (analytics and data warehousing)
10x better performance than other data warehouses, scales to PBs of data
Columnar database - data is stored in columns
Massively parallel query execution (MPP), highly available
Pay as you go based on the instances provisioned
Has a SQL interface for performing the queries
Great to use with BI tools such as AWS Quicksight or Tableau
Data is loaded form S3, DynamoDB, Database Migration Service (DMS), other DBs
Can scale from 1 node up to 128 nodes, up to 160 GB of space per node
Node types: compute nodes for performing queries, result is sent to leader nodes
Redshift Spectrum: perform queries directly against S3 (no need to load the data)
Provides backup and restore
Security is accomplished with VPC/IAM/KMS
Redshift Enhanced VPC Routing: COPY/UNLOAD goes through VPC

Redshift Snapshots and Disaster Recovery

Snapshots are point-in-time backups of a cluster, stored internally in S3
Snapshots are incremental (only what has changed is saved)
Snapshots can be restored into new Redshift clusters
Snapshots can be automated: every 8 hours, every 5 GB, or on schedule
Manual snapshots: snapshots are retained until we delete them
We can configure Redshift to automatically copy snapshots to another regions

Redshift Spectrum

Data can be queried from S3 without the need to load it into Redshift
We must have a Redshift cluster available to start the query
The query is then submitted to thousands of Redshift Spectrum nodes

Neptune

Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets.

Highly available across 3 AZ, up to 15 read replicas
Provides point-in-time recovery, continuous back-up to S3
Support for KMS encryption at rest + HTTPS

Use cases:

Highly relational data
Social networking
Knowledge graphs

OpenSearch

OpenSearch is a fully open-source search and analytics engine for use cases such as log analytics, real-time application monitoring, and clickstream analysis.

Can search any field, as well as partial matches
Complement to another database
We can provision a cluster of instances
Built-in integration with:
- Amazon Kinesis Data Firehose
- AWS IoT
- Amazon CloudWatch Logs for data ingestion
Security through Cognito and IAM, KMS encryption, SSL and VPC
Comes with Kibana and Logstahs - ELK stack

Choosing the Right Database​

Database Types on AWS​

Amazon Relational Databases (RDS)​

Aurora​

ElastiCache​

DynamoDB​

Redshift​

Redshift Snapshots and Disaster Recovery​

Redshift Spectrum​

Neptune​

OpenSearch​

Choosing the Right Database

Database Types on AWS

Amazon Relational Databases (RDS)

Aurora

ElastiCache

DynamoDB

Redshift

Redshift Snapshots and Disaster Recovery

Redshift Spectrum

Neptune

OpenSearch