Shrink storage volumes for your RDS databases and optimize your infrastructure costs

Scaling instances up and down has been relatively straight forward with Amazon Relational Database Service (Amazon RDS) open source engines. However, shrinking storage volumes after an increase in storage size has not been that easy. For example, a company has a Black Friday event coming up and needs to scale up their infrastructure temporarily to meet the peak demands. Once they are past that event, they want to scale down their infrastructure such as compute and storage for costs savings. The Black Friday event is one use case, but you may have a variety of other use cases that may require you to scale your storage volumes up and down, to adjust to your demand signals and better manage your infrastructure costs.

Previously, reducing Amazon RDS storage involved manually migrating data to a new database instance with a smaller storage configuration. Some of the common approaches to this migration included:

Logical backup and restore: Create a logical backup (for example, by using pg_dump or mysqldump) and restore it to a new instance with a smaller storage size.
AWS Database Migration Service (AWS DMS): Use AWS DMS to replicate data to a new instance with smaller storage size, and then manually cutover to the new instance to minimize downtime.
Native database replication: Use database native replication methods (for example, PostgreSQL logical replication or MySQL binlog replication) to transfer data with minimal impact on operations.

While effective, these methods required manual set up and management, including coordination of the cutover procedures, which were complex and error prone. Further, the process often resulted in extended downtime, causing unplanned business disruption.

Recently, Amazon RDS launched the ability to shrink storage volumes using Amazon RDS Blue/Green Deployments – a nice addition to the list of new use cases that Blue/Green Deployments now supports. Blue/Green Deployments create a fully managed staging environment, or Green databases, with your specified storage size, and keep the Blue and Green databases in sync. When you are ready, you can promote the Green databases to be the new production system in as fast as a minute, with no data loss and no changes to your application to switch database endpoints. This simplified approach has more predictable downtime and allows you to increase and decrease your storage volume size based on anticipated application demands. Further, it resolves the apprehension of increased storage costs.

Blue/Green Deployments storage volume shrink, available for Amazon RDS for PostgreSQL major versions 12 and later, RDS for MySQL major versions 5.7 and later, and Amazon RDS for MariaDB major versions 10.4 and later.

As a slight tangent, another Blue/Green Deployments feature that was recently launched and is worth highlighting is that the Green database storage is now fully hydrated prior to promoting the Green databases as the new production system. Previously, you had to manually initialize the storage volumes of the Green databases, often referred to as the ‘lazy loading of S3’.

However, in this post, we focus on the shrink storage capability and guide you through the steps for reducing your RDS instance storage using Amazon RDS Blue/Green Deployments.

Reduce RDS storage volume size using RDS Blue/Green Deployments

When creating Amazon RDS Blue/Green Deployments, you can now specify storage size that can be increased or decreased, change storage volume type such io2 or gp3, and adjust storage performance settings. These changes are accepted by the Blue/Green create API to create Green database as per your specified storage settings.

At a high level, the Blue/Green Deployments storage volume shrink includes the following steps:

Green environment creation: New RDS instances (the Green environment) are created that are a topological copy of your Blue environment, or current production setup.
Storage configuration change: Resize storage and perform storage configuration changes on the green environment while keeping the production blue environment online.
Blue/Green sync: RDS Blue/Green Deployments keep the current (blue) and new (green) environments in sync, ensuring data consistency.
Managed switchover: After the green environment is ready, you can issue a switchover command. Blue/Green Deployments perform a switchover, promoting your Green environment as the new production system, in as fast as a minute.

Solution overview

For the purpose of this post, we focus on RDS for PostgreSQL engine. However, as stated previously, the capability is available for RDS for MySQL and RDS for Maria DB as well. In, this section, we show how to reduce Amazon RDS for PostgreSQL database instance storage with minimum downtime using RDS Blue/Green Deployments.

RDS for PostgreSQL primarily uses PostgreSQL physical replication to synchronize data between the blue and the green environments. However, if you request a major version upgrade when you create the blue/green deployment, it uses PostgreSQL logical replication.

For a high level reference, PostgreSQL physical streaming replication uses the exact disk block address and byte-for-byte replication. The entire cluster is replicated at the same time. PostgreSQL logical replication replicates data objects and their changes based on their replication identity.

See best practices for tuning replication methods for RDS Blue/Green Deployments.

Prerequisites

Before you get started, make sure that you have the following prerequisites:

An AWS account.
An existing RDS for PostgreSQL instance running a PostgreSQL version that supports Amazon RDS Blue/Green Deployments. The sample in this post deploys RDS PostgreSQL v16.5.
The backup retention period is configured with a minimum of 1 day on the database instance.

Create an Amazon RDS Blue/Green Deployments

To create Amazon RDS Blue/Green Deployment, we use Amazon RDS console. The same can be achieved using AWS CLI as well.

In the AWS Management Console for Amazon RDS, choose Databases in the navigation pane.
Select the database instance and select Create Blue/Green Deployment from the Actions
Enter rpg-db1-bg as the name for Blue/Green Deployments identifier, and choose Next.
On the Configure Blue/Green Deployments page, select the same engine version for the green database as the blue database.

If you specify a higher version for the green instance and request a major version upgrade along with storage volume shrink, RDS deploys the green instance using PostgreSQL logical replication instead of the default physical streaming replication. See PostgreSQL replication methods for RDS Blue/Green Deployments for more information. While shrink storage works with both logical and physical replication, we recommend that you use physical replication so that you can overcome the limitations for PostgreSQL community logical replication limitations. For more details, see Tuning PostgreSQL replication methods section of this post.
Scroll down to the Storage section, set the Allocated storage or the green database. For this post, we have a (blue) database with 400GB in size, and we’re creating a new (green) database with 200GB (50% reduction in storage size). See Best Practices for rightsizing storage for your RDS database instance. Choose Next.
Review the configuration and choose Create to create the blue/green deployment.
It might take a few hours or days to create the blue/green deployment depending on various factors, such as the size of the database, storage type, and managed input/output operations per second (IOPS). See Monitoring to track the progress of the blue/green deployment.
After the green environment is fully deployed and shows as available, you can follow best practices for switchover to orchestrate a managed switchover.
After the switchover, you can delete the blue/green deployment, and delete the old blue instance to save on costs.

Monitoring

To reduce the storage size, the instance is first created with the same storage size and then, the automation executes the storage configuration upgrade to reduce the storage size. Storage config upgrade migrates the green DB instance from old file system layout to the preferred system layout. During this time, the engine is not running on the DB instance. The storage volume shrink is a time-consuming operation that can take hours or days. Once the storage config upgrade is complete, the engine starts running and other steps of blue green deployment (like, read replica creation, major version upgrade, configuring backup) proceeds further. The following section shows how to monitor the status and track progress of the operation.

Monitor RDS Blue/Green Deployments status

You can monitor the status of the deployment under the Status tab of the Blue/Green Deployments using the Amazon RDS console or AWS Command Line Interface (AWS CLI).

To monitor status using the Amazon RDS console:

In the Amazon RDS console, choose Databases from the navigation pane and select the Blue/Green Deployment identifier.
Follow the progress of the environment in the Status section.

To monitor status using the AWS CLI:

To monitor the status using the AWS CLI, use the following commands:

aws rds describe-blue-green-deployments –filters Name=blue-green-deployment-name,Values=rpg-demo-bg
{
    "BlueGreenDeployments": [
        {
            "Status": "AVAILABLE", 
            "Tasks": [
                {
                    "Status": "IN_PROGRESS", 
                    "Name": "CREATING_READ_REPLICA_OF_SOURCE"
                }, 
                {
                    "Status": "IN_PROGRESS", 
                    "Name": "SCALE_STORAGE"
                }, 
                {
                    "Status": "IN_PROGRESS", 
                    "Name": "STORAGE_CONFIG_UPGRADE"
                }, 
                {
                    "Status": "PENDING", 
                    "Name": "CONFIGURE_BACKUPS"
                }
            ], 
            "Target": "arn:aws:rds:us-east-1:00000000000:db:rpg-demo-green-zsnsrs", 
            "BlueGreenDeploymentName": "rpg-demo-bg", 
            "CreateTime": "2024-12-08T18:58:44.700Z", 
            "Source": "arn:aws:rds:us-east-1: 00000000000:db:rpg-demo", 
            "BlueGreenDeploymentIdentifier": "bgd-kdcvaqxsursoo0pb", 
            "SwitchoverDetails": [
                {
                    "Status": "PROVISIONING", 
                    "TargetMember": "arn:aws:rds:us-east-1: 00000000000:db:rpg-demo-green-zsnsrs", 
                    "SourceMember": "arn:aws:rds:us-east-1: 00000000000:db:rpg-demo"
                }
            ], 
            "TagList": []
        }
    ]
}

Monitor storage configuration upgrade progress

You can track the progress of the storage configuration upgrade from the RDS console or using AWS CLI.

To monitor upgrade progress using the Amazon RDS console:

In the Amazin RDS console, choose Databases from the navigation pane.
Follow the progress of the environment in the Status section.

To monitor upgrade progress using the AWS CLI:

With the AWS CLI, you can monitor storage the configuration upgrade with the describe-db-instances command. The PercentProgress field in the response shows what percentage of storage has been upgraded.

aws rds describe-db-instances --db-instance-identifier [rpg-demo-green-pxvobj]
{
    "DBInstances": [
        {
            "DBInstanceIdentifier": "rpg-demo-green-pxvobj",
            "DBInstanceClass": "db.m7g.large",
            "Engine": "postgres",
            "DBInstanceStatus": "storage-config-upgrade",
            ...
            "PercentProgress": "5"
        }
    ]
}

Monitor RDS Blue/Green Deployments events

During a Blue/Green deployment, various events are sent, as detailed in the AWS documentation on Blue/Green Deployments.

A notable event is the notification sent by your RDS Blue/Green Deployments, detailing the storage parameters used to create the green primary and replicas.
For example, if your blue instance is on GP3 with 500 GB and 12,000 IOPS, and you switch it to IO2 with 200 GB without specifying IOPS, the green instance will be created on IO2 with 200 GB and 12,000 IOPS. Note that independent of the blue replica size, the green replicas created inherits the storage size from the green primary.
In the event of a failed storage shrink because of an invalid value provided for the target allocated-storage, RDS Blue/Green Deployments sends a notification to alert you to the issue. However, Blue/Green Deployments continue with the creation of the green instance that matches with the original blue primary storage size.

Best practices

In this section, we discuss the best practices of rightsizing to determine the RDS storage volume size, optimizing the RDS instance to reduce storage usage, and optimizing performance of the RDS storage volume shrink operation.

Rightsizing RDS database instance storage

RDS database instances use storage for various purposes, including:

Data files: On-disk files that store the actual database content.
Transaction logs: Logs such as PostgreSQL Write-Ahead Logs (WAL) or MySQL binary logs.
Temporary files: Files created for operations that don’t fit into memory.
Replication disk usage: Space used for database replication, including PostgreSQL replication slots.
Database logs: Logs for database errors, slow queries, and other diagnostic information.

Before reducing your RDS instance’s storage size, be sure to assess the potential impact on performance before making any changes. When rightsizing Amazon RDS database instance storage, it’s important to consider both storage space and performance requirements, including IOPS and throughput (MB per second). Proper sizing involves assessing current storage usage, IOPS, and throughput needs, in addition to allowing for near-term growth. To ensure optimal performance, it’s essential to review your database workload’s metrics over time, especially during periods of high activity (for example, during maintenance or special events). You can monitor these metrics using Amazon CloudWatch.

Storage size

FreeStorageSpace: This CloudWatch metric measures the available storage in MB. To calculate your database’s storage usage, use the following formula:
```
Database instance storage used = Allocated storage size – FreeStorageSpace
```
RDS instances use storage for various tasks, including regular database operations such as inserts, updates, and index creation. It’s important to have enough free storage to accommodate these tasks without impacting performance.

Best practice: Maintain at least 20% free storage to avoid frequent scaling, especially for high- transaction databases. Additionally, ensure that free space is sufficient to handle maintenance tasks, such as rebuilding large tables. If the target allocated-storage is less than 1.2 times of the used storage (20% more than the used storage), RDS Blue/Green deployment will fail to shrink the storage. However, Blue/Green Deployments continue with the creation of the green instance that matches with the original blue primary storage size.

Storage IOPS performance

WriteIOPS: Measured at a rate of counts per second, this CloudWatch metric determines the average number of disk write I/O operations per second.
ReadIOPS: Measured at a rate of counts per second, this CloudWatch metric determines the average number of disk read I/O operations per second.

Storage throughput performance

WriteThroughput: This metric represents the average number of bytes written to disk per second.
ReadThroughput: This metric represents the average number of bytes read from the disk per second.Amazon RDS supports various types of storage volumes including General Purpose SSD (gp2, gp3) and Provisioned IOPS SSD (io1, io2). For gp2 and gp3 volumes, the allocated storage size directly determines baseline IOPS and throughput performance. For provisioned IOPS (io1, io2), the performance is explicitly set.Best practice: Ensure that the baseline IOPS and throughput for your storage volume type and allocated storage are greater than the peak ReadIOPS and WriteIOPS, and peak ReadThroughput and WriteThroughput values, respectively.

Optimizing RDS instances to reduce storage usage

Amazon RDS for PostgreSQL, MySQL, and MariaDB use multi-version concurrency control (MVCC) to enhance concurrent access for both readers and writers. MVCC allows multiple versions of data to be maintained, enabling readers and writers to access their own version of the data without blocking each other. When a piece of data is updated, it does not overwrite the original data item with new data, but instead creates a newer version of the data item.

MVCC can lead to data bloat over time because old versions of data become obsolete and aren’t cleaned up. Excessive data bloat leads to increased storage space usage, inefficient data access, and increased cost. PostgreSQL implements a background process called vacuum and MySQL uses a background process called purge to remove data that’s no longer needed. See the following posts for best practices for tuning RDS for PostgreSQL vacuum and RDS for MySQL purge processes.

Routine vacuum or purge cleans up space occupied by obsolete data and makes the space available for subsequent operations, minimizing data bloat.

You can further optimize storage usage by archiving data to Amazon Simple Storage Service (Amazon S3), deleting unnecessary data, and improving operational efficiency. For example, you can reduce database instance storage usage by:

Implementing data lifecycle management by archiving historical data to S3
Enabling WAL compression
Removing unused or duplicate indexes
Dropping unnecessary temporary tables
Reducing logging levels during non-debug periods

PostgreSQL vacuum or MySQL purge doesn’t release space to the storage until the table or index is rebuilt.

Best practice: After large update or delete operations, for example, after archiving data to S3, you should release space to storage by running PostgreSQL pg_repack or MySQL optimize table to rebuild the associated tables and indexes.

Optimizing performance of RDS storage volume shrink

When performing storage volume shrink operations, several factors can impact the performance and success of the operation. This section outlines key optimization strategies and best practices for managing storage volume shrink operations effectively. We explore how to leverage io2 Block Express storage for enhanced performance, monitor replication storage consumption, and understand the implications of different PostgreSQL replication methods during the process.

The following sections provide detailed guidance on implementing these optimizations while maintaining database availability and performance during the storage reduction process.

Improve performance with io2 Block Express

RDS storage volume shrink is an I/O-intense operation. The operation performs physical data block copies to reconfigure the storage on the green instance. The underlying storage performance of the green instance plays a critical role in the completion time of the operation. To improve the performance of the storage volume shrink operation, you might consider using the best performing io2 Block Express storage when creating the green instance. io2 Block Express delivers up to 4,000 MB per second throughput and 256,000 provisioned IOPS, all while maintaining sub-millisecond latency at the same cost as io1. If you’re currently using provisioned IOPS (io1), upgrading to io2 can improve both performance and cost-effectiveness, particularly during storage reduction operations.

Best practice: Consider upgrading to io2 Block Express for better performance and to support combining io2 upgrade with storage size reduction using Amazon RDS Blue/Green Deployments. However, we recommend that you test io2 volumes in pre-production environment prior to switching your production volumes.

Monitor replication storage consumption

The storage volume shrink operation creates a green staging environment, performs storage volume resize on the staging environment while keeping the production blue environment online. In the staging environment, the reduction of allocated storage occurs alongside a storage configuration upgrade, which might take several days depending on various factors, such as the size of the database, storage type, and managed IOPS. During this process, database transaction log and replication storage usage will increase on your production blue database. You can monitor the storage used by replication with the following CloudWatch metrics.

BinLogDiskUsage (MySQL, MariaDB): The amount of disk space occupied by binary logs. If automatic backups are enabled for MySQL and MariaDB instances, including read replicas, binary logs are created.
TransactionLogsDiskUsage (PostgreSQL): The disk space used by transaction logs.
ReplicationSlotDiskUsage (PostgreSQL): The disk space used by replication slot files.

Best practice: Regularly monitor the available storage and any events related to storage size on the blue instance. If you anticipate that storage is nearing its capacity, consider scaling the storage on the blue instance. This operation will not affect the storage size of your Green database.

Tuning PostgreSQL replication methods

RDS Blue/Green Deployments use PostgreSQL physical streaming or logical replication technology depending on specifications of the green environment. PostgreSQL physical streaming replication uses exact disk block address and byte-for-byte replication. The target is a replica of the source. Both replication source and target are required to run the same major version. PostgreSQL logical replication replicates data objects and their changes based on their replication identity (for example, the primary key). The replication source and target can run different major versions. However, PostgreSQL logical replication has restrictions for replication of Data Definition Language (DDL), large objects, and so on. For more information on the limitations of logical replication, see 31.6. Restrictions in the PostgreSQL documentation.

To verify the type of replication method used by your RDS Blue/Green Deployments, you can sign in to the blue database and run select * from pg_replication_slots;. The value for column slot type shows physical for physical streaming replication and logical for logical replication.

postgres=> select * from pg_replication_slots;
                  slot_name                  | plugin | slot_type | datoid | database | temporary | active | active_pid | xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn | wal_status | safe_wal_size | two_phase | conflict
ing 
---------------------------------------------+--------+-----------+--------+----------+-----------+--------+------------+------+--------------+-------------+---------------------+------------+---------------+-----------+---------
----
 rds_us_west_2_db_44b2pirowg4dnccrokitg6vuuq |        | physical  |        |          | f         | t      |            |      |              | 0/440008A8  |                     | reserved   |               | f         | 
(1 row)

If the green DB instance is created as a physical replica, you won’t be able to perform a major version upgrade manually after the green environment is created.

Best practice: To avoid restrictions of PostgreSQL logical replication, consider performing storage volume shrink and DB engine major version upgrade as separate operations.

Conclusion

In this post, we covered how to use the new storage volume shrink feature in Amazon RDS Blue/Green Deployments to minimize the downtime required to perform the storage size reduction operation. We also reviewed various mechanisms to monitor the progress of storage shrink and best practices on how to arrive at the optimal storage size for your shrink storage task. With the holiday shopping coming up, scale up your infrastructure to meet your demands without any apprehensions since you can now scale your storage down post holidays, if needed. Ho, Ho, Ho!

If you have any feedback or questions, leave them in the comments section.

About the authors

Nitesh Gupta is working as Software Development Engineer with the Amazon RDS Open Source team at AWS, located in Dublin, Ireland. He specializes in building features for Blue/Green Deployments and Multi-AZ DB clusters, enabling customers to achieve seamless migration experiences. In addition to his technical expertise, he also enjoys indulging in movies, embarking on international travel adventures, exploring diverse culinary delights, and cherishing quality time with his loved ones.

Keyur Diwan is a Principal Product Manager with Amazon Aurora/RDS, based out of the Sunless Seattle, United States. He supports products such as Blue/Green Deployments and RDS for PostgreSQL engine. Keyur enjoys camping in the woods, hiking, and meditation, but seldom does any of that. Additionally, he enjoys everything that Nitesh mentions above.

AWS Database Blog