Deep dive on Amazon Aurora and Amazon RDS for PostgreSQL architecture and features

May 2024: This post was reviewed and updated for accuracy.

If you’re considering migrating your self-hosted PostgreSQL database or transitioning your commercial databases to PostgreSQL on AWS, you’ll need to choose the database service that best aligns with your requirements. AWS offers two managed PostgreSQL database options: Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for PostgreSQL.

In this post, we delve into the architecture and features of Aurora PostgreSQL and RDS PostgreSQL. We’ll analyze their performance, scalability, failover capabilities, storage options, high availability, and disaster recovery mechanisms. By understanding the strengths and limitations of each service, you’ll be better equipped to make an informed decision for your PostgreSQL deployment on AWS.

Overview

Aurora PostgreSQL and RDS for PostgreSQL are fully managed PostgreSQL database services. They both offer provisioning various classes of DB instances, multiple PostgreSQL-compatible versions, managing backups, point-in-time recovery (PITR), replication, monitoring, Multi-AZ support, and storage auto scaling.

Aurora PostgreSQL uses a high-performance storage subsystem customized to take advantage of fast distributed storage. The underlying storage grows automatically in segments of 10 GiB, up to 128 TiB. Aurora improves upon PostgreSQL for massive throughput and highly concurrent workloads. The combination of PostgreSQL compatibility with Aurora enterprise database capabilities provides an ideal target for commercial database migrations.

RDS for PostgreSQL supports up to 64 TiB of storage and recent PostgreSQL versions. DB instances for Amazon RDS for PostgreSQL use Amazon Elastic Block Store (Amazon EBS) volumes for database and log storage. RDS for PostgreSQL manages PostgreSQL installation, upgrades, storage management, replication for high availability, and backups for disaster recovery.

The following diagram illustrates the architecture of Aurora PostgreSQL:

Figure 1: Aurora PostgreSQL architecture

In addition to classic Multi-AZ configuration with single standby instance, RDS for PostgreSQL also supports Multi-AZ DB cluster. A Multi-AZ DB cluster is a semi-synchronous, highly available configuration with two readable standby DB instances. A Multi-AZ DB cluster consists a writer DB instance and two reader DB instances in three separate Availability Zones. This set up provides increased capacity for read workloads, and lower write latency.

The following diagram illustrates the architecture of Multi-AZ RDS PostgreSQL:

Figure 2: RDS PostgreSQL Multi-AZ architecture with one standby and two standbys

The following sections discuss some of the key dimensions of Aurora PostgreSQL and Amazon RDS for PostgreSQL.

Storage

Aurora PostgreSQL uses a single, virtual cluster volume that is supported by storage nodes using locally attached SSDs. A cluster volume consists of copies of the data across multiple Availability Zones in a single AWS Region. Aurora storage automatically increases the size of the database volume as the database storage grows. The storage volume grows in increments of 10 GiB up to a maximum of 128 TiB. Storage space used by Aurora dynamically decreases when data is deleted from the cluster. Because the data is automatically replicated across three Availability Zones, data is highly available and durable. Although there is no IOPS limitation based on the storage size, you may need to scale up your DB instance to support workloads requiring higher IOPS. In Aurora, I/O charges are independent of storage. I/O charges are applied as per usage. When using the I/O-Optimized configuration for database clusters, Aurora provides up to 40% cost savings when I/O spend exceeds 25% of Aurora database spend. The following image shows the Aurora PostgreSQL storage architecture with shared storage volume and nodes.

Figure 3: Aurora PostgreSQL storage architecture

Amazon RDS for PostgreSQL supports Amazon EBS solid state drive (SSD) based storage types: General Purpose SSD (gp2/gp3) and Provisioned IOPS (io1, io2). General Purpose SSD gp2 storage delivers a consistent baseline of 3 IOPS per provisioned GiB and can burst up to 3,000 IOPS. gp3 storage volume provides customized storage performance independent of storage size. Storage performance is the combination of IOPS and storage throughput. On gp3 volumes, RDS provides a baseline storage performance of 3000 IOPS and 125 MiB/s throughput for less than 400GiB storage. Provisioned IOPS SSD delivers IOPS in the 1,000–256,000 range. Amazon RDS for PostgreSQL supports storage auto scaling. This feature automatically increases DB instance storage size in chunks of 10 GiB, or 10% of the currently allocated storage, whichever is greater.

Backup

Aurora PostgreSQL backs up DB cluster volume automatically and retains backups for the length of the defined retention period. Aurora automated backups are continuous and incremental. Restore time depends on the volume size and number of transactions logged that need to be restored. There is no performance impact or interruption of database service during backups. For PITR, a new copy of the DB cluster is created from the backup of the database at any point in time within the backup retention period. The continuous and incremental nature of backup improves the PITR restore time.

Amazon RDS automatically takes daily backups of PostgreSQL DB instances one time during a backup window. There is a slight performance impact when the backup initiates for single Availability Zone deployments. In addition, it also continuously archives transaction logs (WALs). For PITR, the full backup is restored first, followed up by replaying WALs until the desired time. For write-intensive RDS for PostgreSQL DB instances, replaying transaction logs may take a long time. Frequently taking manual snapshots can reduce PITR duration.

Scalability

Amazon Aurora readers and Amazon RDS read replicas help reduce the load on the primary DB instance by offloading read workloads to the readers/replicas. It makes it easy to scale out read workloads beyond the capacity constraints of a single DB instance for read-heavy database workloads. In both cases, Aurora and RDS PostgreSQL, write capacity is limited by single writer DB instance.

Aurora PostgreSQL supports up to 15 readers for scaling out read workloads and high availability within a single AWS Region. Aurora provides this by scaling storage across three Availability Zones in the AWS Region. It writes the log records to six copies in three Availability Zones. Since Aurora uses shared storage for writer and readers, the impact of high write workloads on replication is negligible. All Aurora readers are synced with the writer DB instance with minimal replica lag. Long running queries may cause different replica lag for different readers. Usually, this replica lag is a few hundred milliseconds. In some cases, this lag could go up to 60 seconds, which automatically restarts the reader to catch up with latest writes.

With RDS for PostgreSQL, you can create three levels of cascaded read replicas, 5 replicas per instance up to total 155 read replicas per source instance. Cascading read replicas help scale reads without adding overhead to source PostgreSQL DB instance. These replicas serve high-volume application read traffic from multiple copies of data, thereby increasing aggregate read throughput. With more intermediaries in the cascade chain, the replication lag may get progressively higher. You can also promote read replicas when needed to become standalone DB instances. RDS for PostgreSQL also supports five cross-Region read replicas. Replicas are synced with the source DB instance using PostgreSQL streaming replication. PostgreSQL records any modifications in data in write-ahead log (WAL) files. Streaming replication continuously ships and applies the WAL records to replicas to keep them current. This makes an equivalent write load on the replica as of the primary since it needs to process the transaction logs. High write activity at source DB instance, storage type mismatch, and DB instance class mismatch can cause high replication lag. This lag can be up to several minutes. With optimal configurations and workload, in Amazon RDS for PostgreSQL the replica lag is typically a few seconds. The two standbys in three AZ deployment of RDS for PostgreSQL act as failover targets and serve read traffic supports offering high availability and scalability at the same time.

Crash recovery

When the database crashes and needs to perform crash recovery, the transaction logs since last checkpoint are replayed so that the database is brought up to date. A checkpoint flushes the current in-memory modified database pages and WAL information from memory to disk, which provides durability when the database crashes.

Aurora PostgreSQL doesn’t perform checkpoints because the storage system takes the log records it receives from the database node and applies them to the database pages in the storage nodes. Since Aurora storage is organized in many small segments, each segment has its own redo log. As part of a disk read after a crash, the underlying storage replays redo records on demand in parallel and asynchronously. This results in database availability immediately after the crash.

In Amazon RDS for PostgreSQL, during crash recovery, it replays the transaction logs since the last checkpoint. By default, checkpoints are 5 minutes apart. During checkpoint, it writes all the dirty pages from the memory to storage. If checkpoints occur frequently, crash recovery time is reduced. The downside is frequent checkpoints can slow down the database performance because they are I/O intensive operations.

Failover

Amazon RDS automatically detects a problem with primary database instance and triggers a failover. In the case of failover, read/write connections are automatically redirected to the promoted primary instance.

In Multi-AZ Aurora PostgreSQL, the failover time is typically within 30 seconds, which consists of DNS propagation, and recovery. DNS propagation and recovery happens in parallel. DNS propagation takes around 10–15 seconds, whereas recovery time is fast, typically 3–10 seconds.

In Amazon RDS for Multi-AZ PostgreSQL, the failover time is typically around 1-2 minutes. This consists of DNS propagation, and crash recovery. The failover time depends on the time it takes to perform crash recovery, DNS propagation and meet TTL settings at application.

Aurora PostgreSQL and Amazon RDS for PostgreSQL support Amazon RDS Proxy. RDS Proxy is a fully-managed, highly available database feature of Amazon RDS that allows applications to improve scalability by pooling and sharing database connections. When a failover occurs, the client detects the connection failure, discover a new primary, and reconnect to it as quickly as possible. With RDS Proxy, the applications can avoid the complexity associated with failovers and experience faster recovery. During the failover, DNS propagation delay is the largest contributor to overall failover time. RDS Proxy actively monitors database instances and automatically connects clients to the right target. It also maintains idle client connections through database failover without dropping them. Idle connections in this context are connections that don’t have outstanding requests.

High availability and disaster recovery

Aurora for PostgreSQL architecture involves separation of storage and compute. When data is written to the writer DB instance, it sends the data to six storage nodes associated with the cluster volume across multiple Availability Zones in a single AWS Region. Aurora stores these copies regardless of reader instances in other Availability Zones. All Aurora readers return the same data for query results with minimal replica lag typically much less than 100 milliseconds after the primary instance has written an update. Replica lag varies depending on the rate of database change. That is, during the large amount of write operations, you might see an increase in replica lag. All readers in Aurora cluster are accessible via instance endpoints or reader endpoint of the cluster. Aurora promotes one of the readers when a problem is detected on the primary DB instance. If a failure occurs and no Aurora replica has been provisioned, it attempts to create a new DB instance automatically.

Amazon Aurora Global Database offers cross-Region replication. The typical cross-region replication latency is below 1 second. You can also copy and share Aurora DB cluster snapshots across AWS accounts and AWS Regions for DR purposes. Aurora backs up cluster volume automatically and retains restore data for the length of the backup retention period. Using AWS Backup these backups can be shared in another AWS Region to strengthen high availability of data.

Amazon RDS for PostgreSQL provides high availability (HA) and disaster recovery (DR) features by using Multi-AZ deployment options, and sharing snapshots. An RDS snapshot is an automatically created storage volume snapshot of DB instance, backing up the entire DB instance. For Multi-AZ configuration in RDS for PostgreSQL, there are two options available – Multi-AZ with one standby, Multi-AZ with two readable standbys. In Multi-AZ with one standby option, RDS automatically creates a standby DB instance in a different AZ, and synchronously replicates the data from source database. The Multi-AZ DB cluster instances use local storage provided by the instances to store the transaction logs (WAL logs). Write operations are first written to the local storage transaction logs, then flushed to permanent storage on database storage volumes. This helps reduce automatic failovers to typically under 35 seconds, and up to 2x faster transaction commit latency compared to Multi-AZ with one standby.

In classic Multi-AZ RDS PostgreSQL, automated backups are taken from the standby DB instance. Incidents such as primary DB instance failure, storage failure, DB instance scale-up, and network failure trigger failover to make the standby DB instance a new primary DB instance. Amazon RDS for PostgreSQL also supports replicas in the same AWS Region as well as cross-region. This replication is based on database transaction logs, and replication lag can increase depending on the workloads at the source DB instance. You can copy and share Amazon RDS snapshots across AWS accounts and AWS Regions for DR purposes.

Additional features

Aurora also provides several value-add features designed to make operations easier and improve performance. You should evaluate if these features can provide a benefit for your workloads as well. Some of the additional features Aurora PostgreSQL are as below:

Amazon Aurora Fast Database Cloning lets you can quickly create clones of all databases in the DB cluster. This is faster than restoring Amazon RDS for PostgreSQL from a snapshot. If you need to test schema changes, parameter changes or perform analytics in near production data, Aurora database clones are optimal choice than restoring RDS PostgreSQL databases.

With query plan management (QPM) for Aurora PostgreSQL, you can control how and when query plans change. With the changes in table/index statistics, PostgreSQL may lead to create suboptimal query plans. Aurora QPM allows to use optimal query plans to avoid performance degradation.

The cluster cache management for Aurora PostgreSQL feature improves the performance of the new writer DB instance after failover. You can designate a specific replica as the preferred failover target. With cluster cache management, data in the designated replica cache is synchronized with the cache in the writer DB instance also known as warm cache. If a failover occurs, the designated reader uses values in its warm cache immediately when it’s promoted to the new writer DB instance. The availability of warm data results sustained performance after failover.

Aurora Serverless is an on-demand, autoscaling configuration for Amazon Aurora, empowering business-critical workloads with the full breadth of Aurora features, including cloning, global database, Multi-AZ, and multiple readers. It automatically starts up, shuts down, and scales capacity up or down based on application’s needs. Amazon Aurora Serverless v2 instantly scales hundreds of transactions to thousands in a fraction of a second. As it scales, it adjusts capacity in fine-grained increments to provide the right amount of database resources that the application needs.

Amazon Aurora machine learning (ML) adds ML-based predictions to applications via the familiar SQL programming language. It provides simple, optimized, and secure integration between Aurora and AWS ML services without having to build custom integrations or move data around. When a ML query runs, Aurora calls Amazon SageMaker or Amazon Bedrock for a wide variety of ML algorithms including generative AI or Amazon Comprehend for sentiment analysis, so the application doesn’t need to call these services directly.

Conclusion

In this comprehensive post, we’ve explored the architectural details and feature sets of Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for PostgreSQL. Through a detailed analysis of performance, scalability, failover mechanisms, storage options, high availability, and disaster recovery capabilities, we’ve gained valuable insights into the strengths and limitations of each service.

For further guidance on migrating to Aurora PostgreSQL-Compatible Edition or Amazon RDS for PostgreSQL, refer to the following resources:

We hope this deep dive has provided you with valuable insights into the architecture and features of these managed PostgreSQL services on AWS. If you have any questions or suggestions, please leave a comment below.

About the authors

Vivek Singh is a Principal Database Specialist Technical Account Manager with Amazon Web Services focusing on Amazon Aurora PostgreSQL and Amazon RDS for PostgreSQL engines. He works with enterprise customers providing technical assistance on PostgreSQL operational performance and sharing database best practices. He has over 18 years of experience in open-source database solutions, and enjoys working with customers to help design, deploy, and optimize relational database workloads on AWS.

Sagar Patel is a Principal Database Specialty Architect with the Professional Services team at Amazon Web Services. He works as a database migration specialist to provide technical guidance and help Amazon customers to migrate their on-premises databases to AWS.

AWS Database Blog