[AWS] RDS – Deployment & Backups

Using RDS, you are delegating most of the maintenance tasks to AWS. It is important to understand how you can make use of RDS features in order to utilize the service fully.

RDS reserved instances (RI)

Reserved instances (RI) let you reserve a DB instance for a specified period with a significant discount.

  • Each reserved instance is associated with a specific Region, which is fixed for the lifetime of the reservation and cannot be changed.
  • Each reservation can be used in any AZs within the associated region.
  • Reserved instances are available for Multi-AZ deployments.
  • Reserved instances can be used for read-replicas if they are in the same region.

RDS Backups and Snapshots

RDS is a managed service, and backups are automatically done by default.

Automated backups

  • RDS automated backup enables point-in-time recovery (PITR). Automated backups take a full daily snapshot and store transaction logs throughout the day.
  • Backups can be retained 0 ~ 35 days (retention period). The default retention period is 7 days.
  • Automated backups are enabled by default.
  • Backup data are stored in S3, and free storage space is allocated equally to the size of the provisioned database.

Database snapshots

  • Users can create snapshots manually and stored them in S3.
  • Manual snapshots are used for long-term backups. Snapshots exist even after the instance is deleted. Automated backups are deleted when the DB instance is deleted.

Features of Backups and Snapshots

  • For a single AZ RDS instance, I/O may be briefly suspended while the backup process initializes (typically a few seconds), and you may experience a brief period of elevated latency.
  • There is no I/O suspension for Multi-AZ DB deployments, since the backup is taken from the standby.
  • If an instance is encrypted, the backup snapshots are also encrypted.

Restoring RDS Backups

A restored RDS instance is different from the original instance. A restored instance has a new DNS endpoint (CNAME).

Point-in-time recovery (PITR) is the process of restoring a database to the state it was in at a specified date and time.

  • During the recovery, the most recent daily backup is restored, and then transaction logs are applied to the specific request time. This makes a point in time recovery down to seconds.

Snapshots can be copied to the same region or to a different region.

  • If you copy the encrypted snapshot to a different region, a new key needs to be created. (KMS is a regional service)

Multi-AZ Deployment

RDS can be deployed in a single AZ or multiple AZs for resilience and high availability. In the event of failure of the primary instance, RDS will recover automatically using the standby instance.

  • Multi-AZ provisions a primary instance and a standby instance in a different AZ of the same region.
    • There are 2 copies (1 per 1 AZ) except Aurora.
    • Aurora can have 6 copies (2 copies in 3 AZs)
  • Only the primary instance can be accessed using the domain name or CNAME. Standby instances cannot be used for read/write operations.
  • No Performance Benefit, but it provides a better RTO than restoring a snapshot. It is for DR (Disaster Recovery).
  • Replication (Synchronization) is done synchronously in real-time. This might cause elevated latencies.
  • Backups are taken using the standby, ensuring no performance impact.
  • In the case of maintenance (patching or DB instance class scaling), actions occur first on the standby, prior to automatic failover. As a result, your availability impact is limited to the time required for automatic failover to complete.

Converting single-AZ to multi-AZ

  • A snapshot of the primary instance is taken.
  • A standby instance is created from the snapshot in a different AZ.
  • Synchronous replication is configured between the primary and the standby.

RDS Multi-AZ Failover Process

  • The domain name remains the same. Therefore, the endpoint to the RDS remains the same. RDS simply flips the canonical name record (CNAME) for your DB instance to point at the standby.
  • A standby is promoted to the new primary.
  • During the recovery, there is a short service disruption when RDS changes routes from the failed primary to the promoted standby.

Read Replicas

Read Replicas are read-only copies (up to 5) of an RDS instance (can be replicated into a different region).

  • Read Replicas don’t scale writes, only scale reads. – Best fit for ready-heavy databases.
  • Reads from the replicas are eventually consistent (Asynchronous replication).
  • The automatic backup must be turned on.
  • You can have a read replica in a different region.
  • You can promote the replica to the main DB.

All RDS engines support read replicas.

  • For SQL Server, replicas are only available for Enterprise Edition (EE).
  • Amazon RDS creates a second DB instance using a snapshot of the source DB instance.
  • It uses the engines’ native asynchronous replication to update the read replica.

Aurora Replicas

  • It uses an SSD-backed virtualized storage layer purpose-built for database workloads.
  • Amazon Aurora replicas share the same underlying storage as the source instance, lowering costs and avoiding the need to copy data to the replica nodes.

Amazon RDS Proxy

RDS Proxy is a fully managed, highly available database proxy feature for Amazon RDS. Available for Aurora and MySQL now.

Benefits

  • Improve scalability by pooling and sharing database connections
  • Improve availability by reducing database failover times by up to 66% and preserving application connections during failovers
  • Improve security by optionally enforcing AWS IAM authentication to a database and securely storing credentials in AWS Secrets Manager

Limitations

  • RDS Proxy can add an average of 5 milliseconds of network latency to query or transaction response time.

Use cases of RDS Proxy

  • Applications with unpredictable workloads
  • Applications that frequently open and close database connections
  • Applications that keep connections open but idle
  • Applications requiring availability through transient failures
  • Improved security and centralized credentials management

Leave a Comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s