AWS Storage Blog
Enhance PostgreSQL resiliency and recoverability using Amazon FSx for OpenZFS
Organizations today often depend on mission-critical database workloads. When these databases become unavailable, it can significantly impact revenue and erode user trust. To mitigate this risk, these critical environments need the fastest possible recovery capabilities if there is a failure. This means having the lowest recovery time objective (RTO) and recovery point objective (RPO). Achieving these aggressive RTO and RPO requirements often forces organizations to maintain multiple copies of the data, which increases management complexity and storage (performance and capacity) costs. The IT teams responsible for making sure of low RTO/RPO end up overseeing an ever-growing storage infrastructure, while maintaining the highest levels of availability, resiliency, and performance. One solution used by users to meet aggressive RTO/RPO goals is storage-based snapshots, which eliminate the need for redundant data copies and more storage.
Amazon FSx for OpenZFS allows you to use the native snapshot capability to create point-in-time copies of large databases in seconds. These snapshots initially consume no more storage space and only consume incremental space as data changes. Snapshots can also be promoted to clones. Clones provide read and write access to data, as compared to snapshots, which are read-only. Similar to snapshots, clones consume zero space at creation. These space efficient clones can be presented to your test, development, and quality assurance environments. This allows you to decrease your overall storage requirements and spend for your database environments. For high availability (HA), FSx for OpenZFS can be deployed as either a Single-Availability Zone (AZ) or Multi-AZ file system. Both HA deployments provide a primary and a standby file server to failover to during maintenance or during a failover event. The file system delivers consistent microsecond latencies, which provide predictable performance to the most demanding database workloads.
In this post, we show you how to create a snapshot and rollback to that point-in-time, using FSx for OpenZFS with a PostgreSQL database running on Amazon Elastic Compute Cloud (Amazon EC2). This same process can be used for other database environments, such as Oracle, MySQL, MariaDB, and others. We present OpenZFS datasets, known as volumes in FSx for OpenZFS, through network file system (NFS) to the Amazon EC2 database client. In our scenario we have chosen to use a Multi-AZ FSx for OpenZFS file system that provides the highest availability to the database storage.
Solution overview
This solution provides an application-consistent snapshot of a PostgreSQL database on an FSx for OpenZFS volume (Figure 1) by using the FSx API and PostgreSQL hot backup mode. After changes are made to the database, it is reverted back to the point-in-time of the snapshot recovering the database in seconds.
Figure 1: PostgreSQL database mounted on an FSx for OpenZFS volume
Taking storage snapshots is generally a more efficient and reliable method of creating a restore point of a PostgreSQL database as compared to the traditional pg_dump utility. The pg_dump utility works by exporting the logical structure and data of the database, which is a slower and more resource-intensive process, especially for large databases. For example, a 20 TB database using pg_dump could end up consuming over 200 TB of space with multiple (or 10x in this example) backup copies, whereas storage snapshots would only incur the incremental change of data consumed by the database. However, these snapshots don’t remove the requirement for a complete backup strategy. Snapshots reside on the file system, and if the file system were lost, then the snapshots would be, too. A well-architected design would use FSx for OpenZFS snapshots to compliment a backup strategy.
Prerequisites
The following prerequisites are needed to implement this solution:
- This post assumes the reader understands PostgreSQL databases, Linux configuration, and mounting of NFS exports.
- An FSx for OpenZFS Multi-AZ file system has been deployed in your virtual private cloud (VPC).
- An EC2 instance running Linux and a test PostgreSQL database in the same VPC as the FSx for OpenZFS primary file server.
- The AWS Command Line Interface (AWS CLI) installed and configured on the EC2 instance running Linux and the PostgreSQL database.
- The PostgreSQL database installed on a volume from the FSx for OpenZFS file system.
- Access to both the PostgreSQL database EC2 instance and the FSx for OpenZFS file system.
- Intermediate Linux skills as an administrator.
Walkthrough
The following is a detailed walkthrough of the steps needed to perform storage-based snapshots of a PostgreSQL database on an EC2 instance. At a high level, the process is as follows:
- Connect to the Amazon EC2 database instance.
- Confirm PostgreSQL is running on the FSx for OpenZFS volume and view the current database tables.
- Place the PostgreSQL database in hot backup mode.
- Take a snapshot of the FSx for OpenZFS volume.
- Take the PostgreSQL database out of hot backup mode.
- Make some changes to the PostgreSQL database.
- Stop the PostgreSQL database.
- Roll the FSx for OpenZFS volume back to the point-in-time when the snapshot was taken.
- Start the PostgreSQL database back up and confirm that the changes made have been removed and it is back to the point-in-time when the snapshot was created.
Follow these steps to implement the preceding process.
Gather database and file system information
1. Open two connections to the Amazon EC2 PostgreSQL server console through EC2 Session Manager or SSH, as shown in the following Figure 2 and Figure 3.
Figure 2: EC2 instance connect
One connection is connected as the ssm-user or ec2-user and the second as the postgres user.
Figure 3: Session Manager connect
2, Switch to the postgres user by issuing the following commands:
sudo su – postgres
whoami
3. As the postgres user, connect to the database, set the path to the schema, and list the tables. Replace ‘<DATABASE>
’ with your database and ‘<SCHEMA>
’ with your values.
psql
\c ‘<DATABASE>’
SET search_path TO ‘<SCHEMA>’;
\dt
In our example, we use the mydb
database and myschema
schema as follows:
psql
\c mydb
SET search_path TO myschema;
\dt
You should observe output similar to Figure 4 listing the tables of your database:
Figure 4: PostgreSQL listing of tables
4. Validate the data directory on which your PostgreSQL database resides with the following command:
SHOW data_directory;
You should observe output similar to Figure 5:
Figure 5: PostgreSQL listing of databases directory
5. Using the PostgreSQL data directory path, find the associated NFS mount point to identify the FSx for OpenZFS file system ID. Replace <DATADIRECTORY> with the path to your PostgreSQL database directory.
mount | grep <DATADIRECTORY>
In our example, the PostgreSQL database is mounted at the /fsxzfs
path, as shown in Figure 6.
mount | grep /fsxzfs
Figure 6: PostgreSQL mount directory
6. From the NFS mount point, copy the file system ID to a text editor for later use in finding the volume ID, as shown in the Figure 7.
Figure 7: File system ID from mount directory
In our example, the file system ID is fs-04dff1b918181e45f
.
If you used the IP address to mount the FSx for OpenZFS file system, then observe the AWS Management Console File System ID section.
7. With the file system ID, find the volume ID for the volume containing the PostgreSQL database replacing <FILESYSTEM-ID>
with your value in the following command:
aws fsx describe-volumes --filter Name=”file-system-id”,Values=”<FILESYSTEM-ID>”
In our example we use the file system ID fs-04dff1b918181e45f
as follows:
aws fsx describe-volumes --filter Name=”file-system-id”,Values=”fs-04dff1b918181e45f”
Figure 8: Volume ID for PostgreSQL volume
Validate the VolumePath with the output from the mount command to make sure that you have the correct volume on your file system. Copy the VolumeId to a text editor for use in taking the snapshot.
FSx for OpenZFS snapshot creation
1. Put the database in hot backup mode prior to taking a snapshot. This makes sure that the snapshot is application consistent and recoverable to the point-in-time when the snapshot is taken. Enter the following command to start the backup mode:
SELECT pg_backup_start (label => ‘FSxSnapshot1’, fast => true);
If your specific PostgreSQL version is 14 and earlier, then use pg_start_backup
instead of pg_backup_start
.
You should observe output similar Figure 9:
Figure 9: PostgreSQL entering backup mode
2. Take a snapshot of the FSx for OpenZFS volume with the volume ID copied earlier using the following command replacing <VOLUME-ID>
with your volume ID.
aws fsx create-snapshot --name “rto_rpo_db_snapshot” --volume-id “<VOLUME-ID>”
In our example (Figure 10) we use the volume ID fsvol-024fd91f542196f52
as follows:
aws fsx create-snapshot --name “rto_rpo_db_snapshot” --volume-id “fsvol-024fd91f542196f52”
Figure 10: FSx for OpenZFS snapshot creation
You can also create snapshots through the AWS Management Console. Observe the AWS Management Console Snapshot section.
3. Validate that the snapshot lifecycle has transitioned from “PENDING
” or “CREATING” to “AVAILABLE
” by replacing <VOLUME-ID>
with the ID of your volume. From the output, copy the SnapshotID to a text editor.
aws fsx describe-snapshots --filters Name=”volume-id”,Values=”<VOLUME-ID>”
In our example (Figure 11) we use the volume ID fsvol-024fd91f542196f52
as follows:
aws fsx describe-snapshots --filters Name=”volume-id”,Values=”fsvol-024fd91f542196f52”
Figure 11: Snapshot ID and lifecycle status
4. When the snapshot status has changed to “AVAILABLE
”, take the database out of hot backup mode (Figure 12). In the EC2 PostgreSQL server console, enter the following command:
SELECT * FROM pg_backup_stop(wait_for_archive => true);
Figure 12: Exit PostgreSQL backup mode
If your specific PostgreSQL version is 14 and earlier, then use pg_stop_backup
instead of pg_backup_stop
.
Amazon FSx for OpenZFS snapshot restore
The preceding steps performed an application-consistent snapshot of the PostgreSQL database, from which you now have a recovery point.
1. With the snapshot created, make changes to the PostgreSQL database that you can rollback. For example, you can add a table called remove_me to your PostgreSQL database with the following commands:
CREATE TABLE remove_me (id INTEGER PRIMARY KEY, name VARCHAR);
\dt
You should observe output similar to Figure 13:
Figure 13: Create table output in PostgreSQL
2. Observe the changes you’ve made to the database. You confirm these changes have been removed after rolling the database back to the snapshot.
3. After your changes to the database are complete, shutdown the PostgreSQL database to perform a restore in the following steps.
sudo systemctl stop postgresql
4. Use the FSx for OpenZFS volume ID and snapshot ID gathered while taking the snapshot, and restore the volume to the point-in-time when the snapshot was created (Figure 14). The snapshot restore transitions from a status of “PENDING
” to “COMPLETED
” when the restore is finished. Replace <VOLUME-ID>
and <SNAPSHOT-ID>
with your values in the following command:
aws fsx restore-volume-from-snapshot --volume-id <VOLUME-ID> --snapshot-id <SNAPSHOT-ID>
In our example we used the volume ID fsvol-024fd91f542196f52
and snapshot ID fsvolsnap-078afb9c520533b06
as follows:
aws fsx restore-volume-from-snapshot -–volume-id fsvol-024fd91f542196f52 --snapshot-id fsvolsnap-078afb9c520533b06
Figure 14: Snapshot restore
In our example we have a single snapshot created on the volume. In a production environment you would likely have several point-in-time snapshots to roll back to on a volume. If the decision is made to rollback, then you need to add the flag “--options DELETE_INTERMEDIATE_SNAPSHOTS
” to the preceding restore-volume-from-snapshot command. This results in the snapshots between the current time and the snapshot being rolled back to being deleted from the file system.
5. Start and connect to the database. Confirm that your changes have been rolled back and the database has been restored to its state at the time when you took the snapshot.
sudo systemctl start postgresql
sudo su – postgres
psql
\c mydb
SET search_path TO myschema;
\dt
6. Validate that the changes you made in Step 1 aren’t present (as the snapshot to which it was restored was taken before the changes). If you’ve been following our example, then observe in the following screenshot how the remove_me table
is now gone because you rolled the database back to the restore point of the snapshot (Figure 15).
Figure 15: List tables after restoration
AWS Management Console file system ID
To find the file system ID in the Console, follow these steps.
1. Open the Console and search for FSx in the search bar, choosing the Amazon FSx
2. On the Amazon FSx service page, choose File systems.
3. Choose the copy to clipboard link next to your FSx for OpenZFS file system on which the PostgreSQL database is, as shown in Figure 16.
Figure 16: Amazon FSx file systems
AWS Management Console snapshot
To create a snapshot of an FSx for OpenZFS volume in the Console, follow these steps.
1. Open the Console and search for FSx in the search bar, choosing the Amazon FSx
2. On the Amazon FSx service page, choose File systems.
3. Choose the link for the FSx for OpenZFS file system ID on which your PostgreSQL database is, as shown in Figure 17.
Figure 17: Amazon FSx file systems
4. On the file system details page, select the Volumes tab, the check box next to the volume on which your PostgreSQL database is running, the actions drop down, and then choose Create snapshot, as shown in Figure 18.
Figure 18: FSx for OpenZFS Volumes tab
5. In the Create snapshot dialog box, enter the name of the snapshot, such as “rto_rpo_db_snapshot”, and choose Confirm to create the snapshot, as shown in Figure 19.
Figure 19: Create snapshot dialog
6. Go back to the Snapshots tab for your volume and wait until your snapshot is ready for use when the status has transitioned from Creating or Pending to Available as in Figure 20.
Figure 20: Volume Snapshots tab
Cleaning up
There are costs associated with running EC2 instances and FSx file systems. Remember to delete and terminate unused file systems and EC2 instances if they are no longer needed.
Conclusion
In this post, we demonstrated how to create an application-consistent snapshot for your PostgreSQL database on Amazon FSx for OpenZFS. Then, the database was rolled back to the point-in-time of the snapshot in seconds, which provided an extremely low RTO. These snapshots can not only be used as a quick restore point, but also to create space-efficient clones of your databases. Then, these cloned databases can be presented to your test, development, and quality assurance environments.
By using FSx for OpenZFS, you can quickly take application-aware snapshots and restore your data back to a specific point-in-time instantaneously, and regardless of the size of the database. This lowers your RPO and RTO for your critical database workloads. Using our documentation, these snapshots can be scheduled and combined in a script or an AWS Lambda to automate their creation.
For further information about FSx for OpenZFS, visit the product page.
Thank you for reading this post. We encourage you to leave comments in the comments section.