[AWS] DynamoDB

DynamoDB is a highly resilient (low-latency) NoSQL database service that replicates data across multiple AZs. DynamoDB can replace existing No SQL databases such as MongoDB, Cassandra DB, or Oracle NoSQL.

Features

Supports both key-value and document data models
Resilient in a Region (multi-AZs)
- spreads data at least across 3 geographically distinct data centers (AZs).
  - Automatic Synchronous Replication
- provides consistent responsiveness
Performance
- SSD Storage
- Single-digit millisecond latency
- Unlimited throughput and storage
A serverless and fully managed NoSQL database service
- stores data in JSON-like, name-value documents
- flexible Schema
- automatically scales horizontally
- supports event-driven programming
Security
- Encryption at rest using the AWS-owned CMK by default.

Components

TABLE
- A collection of items that share the same primary key
  - Partition Key (PK)
  - PK and SK (Sort Key)
- The table name is case-sensitive.
- There is no strict inter-table relationship.
- ARN: “arn:aws:dynamodb:{region}:{account}:table/{tableName}”
  - “arn:aws:dynamodb:us-east-1:925307459448:table/Music“
- Performance is controlled at the table level.
ITEM
- A collection of attributes (up to 400KB) inside the table
ATTRIBUTE
- A key and value pair
- Attributes can be nested

Partition

A partition is a block of storage to save data.
- One partition holds up to 10 GB of data.
- It supports 3,000 RCUs (Read Capacity Units) or 1,000 WCUs (Write Capacity Units).
A Partition Key is used for partition selection via internal hash function.

Primary Keys

Partition Key (Hash)
- Represents a unique attribute such as an ID or email address.
- Each item is stored separately.
  - The partition key is used as an input to the internal hash function, which returns the partition (physical location) on which data is stored.
Composite Key (Partition Key + Sort Key)
- Items in the table might have the same partition key, but they have the different sort key.
- All items with the same partition key are stored together and then sorted according to the sort key.

How do you use the keys?

The partition key determines the logical partitions in which a table’s data is stored and affects the underlying physical partitions.
- Provisioned I/O capacity of the table was divided evenly among these physical partitions.
The throughput of a table depends on the partition key design as well. To improve the performance, you need to provide more distinct partition keys. With more partition keys, the queries will be spread across the partitioned space.

Reading and Writing Data

DynamoDB writes and reads an item as a whole.

Basic Write Operations

PutItem: writes a new item to the specific primary key, or replace an existing item with the same key (last-write win)
- Creates a new item
- Replaces an old item with a new item
UpdateItem: changes attributes for an item with the specified primary key
- Edits an existing item (Conditional update)
- Adds a new item if it does not already exists
BatchWriteItem: writes bunch of items to the specified primary keys
DeleteItem: removes the item with the specified primary key

Basic Read Operations

GetItem: retrieves item with a specific primary key
BatchGetItem: retrieves items with the specified primary keys
Scan: retrieves all items in the table
Query: retrieves items matching the sort key expression for the specified partition key

Scan

Most flexible option.
You can apply a filter with any attribute.
- You can use a scan operation without a filter. This will get all items.
- When a filter is used, a scan operation reads all items and applies a filter. So, it consumes a lot of capacity units.

Query

With Query, you can only retrieve the data you want.
Only PK and SK can be used as query conditions.
- You can retrieve items of the partition key.
- Query is fully indexed and very efficient.
- And refine the query with the sort key with conditions
You can apply filters after querying items.

Eventually Consistent Reads (default)

Data can be read from any node. Consistency spreads all nodes within 1 second.
The data received may not reflect a recent write.
The best practice is to use eventually consistent read wherever possible.

Strongly Consistent Reads

Data is retrieved from a leader node.
The most up-to-date copy of data are returned.
Must be requested explicitly

DynamoDB Transactions

DynamoDB transactions allow you to insert, update, delete items as a single logical operation.

Supports transactions across multiple tables
Provides ACID (Atomicity, Consistency, Isolation, and Durability)

Capacity

DynamoDB reserves the necessary resources to handle your throughput requirements and divides the throughput evenly among partitions. It is also the major factor in DynamoDB pricing.

Read Capacity Units (RCUs)

One read capacity unit represents one strongly consistent read per second, or two eventually consistent reads per second, for an item up to 4 KB in size.
- 1 rcu = 4 KB read per second (strongly consistent)
- 1 rcu = 2*4 = 8 KB read per second (eventually consistent)
A single “transactional read” requires 2 RCUs per 4KB data.
(example 1) Item size: 5 KB
- 5/4 and round up => 2 consistent reads per item
- Eventually consistent reads => 1 RCU (2/2)
- Strongly consistent reads => 2 RCUs
(example 2) Item size: 9 KB
- 9/4 and round up => 3 consistent reads per item
- Eventually consistent reads => 2 RCUs (3/2, round up)
- Strongly consistent reads => 3 RCUs
(example 3) Item size: 3 KB, You need 50 eventually consistent reads per second. How many RCUs do you need?
- 3 / 4 -> round up to 1 => 1 consistent read
- 1 * 50 => 50 consistent reads
- Eventually consistent reads => 50 / 2 => 25 RCUs

Write Capacity Units (WCUs):

1 WCU = 1 KB write per second
Atomic transaction requires 2 * wcu to complete. (Prepare + Commit)
(example 1) Item size: 5KB
- Standard write => 5 WCU
- Transactional write => 10 WCU (5*2)
(example 2) Item size: 600 byte, You need write 100 items per second? How many WCUs do you need?
- 0.6/1 -> round up => 1 standard write
- 1 * 100 = 100 standard writes
- 100 WCU

Rad/Write Capacity Modes

Provisioned throughput (default)
- You can specify expected read and write throughput requirements.
  - Each table is configured with RCUs and WCUs.
- Cheaper per request than on-demand mode
- Subject to throttling due to under-provision.
  - You can setup auto-scaling even in the provisioned mode.
- Use Cases
  - With predictable and consistent traffic
On-demand
- The capacity automatically scales.
- You are charged by per-request.
- Use Cases
  - When the usage is not predictable. No minimum capacity.
You can switch the mode only once every 24 hours.

The following DynamoDB features are charged:

Read and Write Capacity
Storage of data
Cross-region data transfer (no charge for a single region data transfer)

DynamoDB Streams

https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.Lambda.Tutorial.html

When enabled, DynamoDB Streams captures a time-ordered sequence of item-level modifications (insert, update, and delete) in a DynamoDB table and stores the information for up to 24 hours.

You can create a trigger system (event-driven architecture) with DynamoDB Streams and Lambda functions.

DynamoDB Streams supports the following stream record views:

KEYS_ONLY: only the key attributes of the modified item
NEW_IMAGE: The entire item, as it appears after it was modified
OLD_IMAGE: The entire item, as it appears before it was modified
NEW_AND_OLD_IMAGES: Both the new and the old images of the item

Integration with Kinesis Data Stream

You can send DynamoDB Stream (item-level changes) to Kinesis Data Stream.
- Longer retention period (up to 365 days rather than 24 hours)
- You can store data into S3, RedShift, or OpenSearch via Kinesis Firehose.
- You can analyze data via Kinesis Analytics.

DynamoDB Global Tables

Global tables are managed multi-master, multi-region replication based on DynamoDB streams.

Active-Active replication
Requirements:
- Enable Streams
Multi-region redundancy for high availability.
A global table is used for globally distributed applications for multi-region redundancy.
Replication latency is under one second.

DynamoDB Indexes

Indexes provide an alternative representation of data for varying query demands. You can retrieve data using a Query from the index. A table can have multiple secondary indexes to many different query patterns.

Local Secondary Index (LSI)

LSI is an alternative view and must be created at the same time as creating a table.
LSI uses the same PK but an alternative SK. The SK consists of exactly one scalar attribute.
LSI shares the performance with the main table. (RCUs and WCUs)
You can create up to 5 LSIs per table.

Global Secondary Index (GSI)

GSI can be created at any point after the table is created.
GSI uses different PK and SK to allow efficient query operations.
GSI is based on asynchronously replicated data and has their own RCU and WCU.
- It cannot use the strongly consistent read. Only the eventually consistent read is supported.
You can create up to 20 GSIs per table.

Time To Live (TTL)

You can set the expiry time for the data in the DynamoDB table.

Expired items are marked for deletion.
- After being marked, the item will be deleted within 48 hours.
Good for removing old data such as sessions, event logs, or temporary data.
You need to save the TTL attribute in each item and enable TTL using the attribute.

The TTL value should be

a Number (N) data type
a timestamp in Unix epoc time format
Use the converter to get the value: https://www.epochconverter.com/

Backups

You can backup your tables automatically or manually. Note that the recovery from the backup will create a new table.

Point-in-time recovery (PITR)

PITR provides continuous backups of your data for 35 days.
- Incremental backups
It helps you protect against accidental write or delete operations.

ON-demand Backups

Full backups at any time
Zero impact on the table performance

Export to S3

You can export data from DynamoDB to S3.
- Format: JSON or Amazon ION
- The PITR needs to be enabled.
Athena can be used to query data in S3.
You can import data back from S3 to DynamoDB.
- Importing does not consume any write capacity.
- A new table is created.

Error Retries and Exponential Backoff

ProvisionedThroughputExceededException
- Your request rate is too high for your provisioned RCUs/WCUs.

When you get an server-side exception or ProvisionedThroughputExceededException occasionally, you can implement the retry logic.

In addition to simple retries, each AWS SDK implements an exponential backoff algorithm for better flow control.
- The exponential backoff uses progressively longer waits between retries for consecutive error responses.
- For example, up to 50 milliseconds before the first retry, up to 100 milliseconds before the second, up to 200 milliseconds before third, and so on.

If the problem is consistent, you need to consider increasing the read/write capacity.

BatchGetItem Error handling

The BatchGetItem operation returns the attributes of one or more items from one or more tables.

A single operation can retrieve up to 16 MB of data, which can contain as many as 100 items.

ValidationException

If you request more than 100 items, BatchGetItem returns a ValidationException with the message “Too many items requested for the BatchGetItem call.”

UnprocessedKeys

If a partial result is returned, the operation returns a value for UnprocessedKeys.
At least one of the items is successfully processed.
Retry the batch operation using the “exponential backoff algorithm”
- AWS SDK uses the “exponential backoff algorithm”

ProvisionedThroughputExceededException

None of the items can be processed due to insufficient provisioned throughput.

Fine-grained Access Control with IAM

IAM Condition parameter dynamodb:LeadingKeys allows users to access only the items where the partition key matches their user id.

{
  "Version": "2012-10-17",
  "Statement": [
  {
    "Sid": "AllowAccessToOnlyItemsMatchingUserID",
    "Effect": "Allow",
    "Action": [
      "dynamodb:GetItem",
      "dynamodb:PutItem",
      "dynamodb:UpdateItem"
    ],
    "Resource": [
      "arn:aws:dynamodb:us-west-1:123456789012:table/Orders"
    ],
    "Condition": {
      "ForAllValues:StringEquals": {
        "dynamodb:LeadingKeys": [
            "${www.amazon.com:user_id}"
        ],
        "dynamodb:Attributes": [
          "UserId",
          "Item",
          "Price"
        ]
      },
      "StringEqualsIfExists": {
        "dynamodb:Select": "SPECIFIC_ATTRIBUTES"
      }
    }
  }
  ]
}

Best Practices

Primary Keys
- Choose the partition key carefully to avoid hot spots.
- Distribute reads and writes across multiple partitions.
Hot and Cold Data
- Some data might be accessed more frequently than others.
  - ex) Order table: userId (partition key), timestamp (sort key) — recent orders are accessed more often than older ones.
- Move old data to other tables with lower provisioned capacity
- Consider moving rarely accessed data to S3
Managing Large Attributes
- Store large attribute values in Amazon S3
- Compress large values
- Break up large attributes across multiple items
Use Optimistic Locking with a Version Number
- Use a “Version” attribute to check whether the item you are working on has been updated between the last read and update
Use One-to-Many tables (with the same primary key) instead of a large number of attributes
- Overcome the maximum item size (400KB)
- Only read the data you need

[AWS] DynamoDB

Features

Components

Reading and Writing Data

DynamoDB Transactions

Capacity

DynamoDB Streams

DynamoDB Global Tables

DynamoDB Indexes

Local Secondary Index (LSI)

Global Secondary Index (GSI)

Time To Live (TTL)

Backups

Export to S3

Error Retries and Exponential Backoff

BatchGetItem Error handling

Fine-grained Access Control with IAM

Best Practices

Published by P. L.

Leave a Comment Cancel reply

Features

Components

Reading and Writing Data

DynamoDB Transactions

Capacity

DynamoDB Streams

DynamoDB Global Tables

DynamoDB Indexes

Local Secondary Index (LSI)

Global Secondary Index (GSI)

Time To Live (TTL)

Backups

Export to S3

Error Retries and Exponential Backoff

BatchGetItem Error handling

Fine-grained Access Control with IAM

Best Practices

Share this:

Related

Published by P. L.

Leave a Comment Cancel reply