Backing up Oracle databases to Amazon S3 at scale

In today’s data-driven world, safeguarding critical information stored in Oracle databases is crucial for enterprises. Companies struggle to efficiently backing up vast amounts of data from hundreds of databases powering enterprise resource planning (ERP) systems and critical applications. These backups must be secure, durable, and easily restorable to ensure business continuity, guard against ransomware, and stay compliant. Businesses need a simple, cost-effective solution that ensures data immutability and scalability over time.

Amazon S3 File Gateway enables efficient scalable Oracle database backups to the cloud and uses Amazon S3 for durable, cost-effective storage while maintaining fast access through local caching. It supports Network File System (NFS) and Server Message Block (SMB) protocols, enabling easy integration with existing backup workflows. S3 File Gateway works for on-premises and cloud databases, allowing unified backup across hybrid environments. Using native AWS and Oracle tools helps companies avoid extra licensing costs and simplify backups.

In this post, we explore how to scale Oracle database backups to Amazon S3 using multiple S3 File Gateways. Building on our previous post about using a single S3 File Gateway for Oracle database backups to S3, we focus on implementing large-scale backup strategies, finding the optimal S3 File Gateway configuration, and enhancing performance monitoring. This approach provides businesses with a simple, cost-effective solution for data immutability and scalability, suitable for both on premises and cloud databases.

Solution overview

In this solution, we demonstrate scalability using multiple S3 File Gateways for large Oracle database backups. Four gateways serve as an example to handle hundreds of databases totaling hundreds of terabytes. We will guide you in determining the optimal gateway configuration for your specific workload.

Figure 1 depicts the architectural workflow of the proposed solution, showcasing four S3 File Gateways writing backup data to Amazon S3 from multiple Oracle databases running on Amazon Elastic Compute Cloud (Amazon EC2) instances. Amazon RDS Custom for Oracle is used as the database to store RMAN catalog and Oracle Management Service (OMS) repository. The architecture references the Oracle Enterprise Manager (OEM) console and OEM agent for centralized backup management, and Oracle Management Service (OMS) as the central orchestrator. However, users have the flexibility to implement their preferred orchestration tools for managing database backups.

Figure 1: Four S3 File Gateways writing backup data to Amazon S3 from multiple Oracle databases running on Amazon EC2 instances

Figure 1: Four S3 File Gateways writing backup data to Amazon S3 from multiple Oracle databases running on Amazon EC2 instances

Finding the optimal S3 File Gateway configuration

The following step-by-step instructions demonstrate how to find the optimal S3 File Gateway configuration for your environment.

1. Calculate the size of the database full backups

Run the SQL from GitHub in the RMAN Catalog to find the size of the last five full backups and get the average OUT_SIZE for the database backup size. Gather this data for all databases to be backed up to know the total backup size needed.

For most production databases, full backups take longer than incremental backups. Sizing S3 File Gateway to handle full backups makes sure it can also handle incremental backups. If incremental backups are larger, then adjust the configuration based on the incremental backup size for optimal performance.

2. Time allocated for full database backup

To identify sizing requirements for an S3 File Gateway, determine the acceptable time window for completing the backups of all enterprise databases. We assume a general use case where full backups are done on weekends, commonly from Friday 10 PM to Monday 5 AM, giving a 55-hour window to complete the full backup of all databases.

3. Identify the S3 File Gateway configuration

a. Add a buffer: Recommended to add 20% buffer to the OUT_SIZE for extra load handling.

b. Bandwidth: AWS Storage Gateway receives database backups over NFS, and uploads them to Amazon S3, so you must make sure there is enough bandwidth available.

c. Cache disks: Storage Gateway Amazon EC2 instances use Amazon Elastic Block Store (Amazon EBS) as cache disks to store data before uploading to Amazon S3. This consumes IOPS and throughput for both writing and reading. We recommend using multiple cache disks for each gateway to spread the workload.

d. To scale the configuration, you can deploy additional file gateways.

- - RMAN can stream backups to multiple NFS mount points for better backup read/write throughput.

In our example we have ~19 TB of data for backup. A single S3 File gateway can write ~20 TB per day to Amazon S3. Since we want our backup job to complete within 5 hours we are going to start with four S3 File gateways, each with four cache disks to spread the load.

To design the environment appropriately, we start by obtaining the average OUT_SIZE from a SQL statement and add 20% for buffer space. Then, this value is divided by the backup completion time, which is assumed to be 55 hours. The result is further divided by the number of S3 File Gateways, which is four in this example. Finally, we multiply the throughput requirement by two to account for the S3 File Gateway handling both receiving and uploading the backup files simultaneously.

Amazon S3 File Gateway network requirement per hour = (((Average OUT_SIZE of all database + 20%) / 55 hours) / Number of Storage Gateways) * 2

Amazon S3 File Gateway bandwidth requirement for EBS = (((Average OUT_SIZE of all database + 20%) / 55 hours) /Number of Storage Gateways) * 2

In the following we divide by the number of cache disks attached to the S3 File Gateway.

Cache volumes EBS bandwidth requirements = ((((Average OUT_SIZE of all database + 20%) / 55 hours) / Number of Storage Gateways) * 2) / Number of EBS volumes

Cache volumes EBS IOPS requirements vary with source workload IO size. Adjust to the optimal number.

Prerequisites

The following prerequisites are necessary to complete this solution.

AWS account(s)
S3 bucket with folders named after each database’s db_name
Oracle Databases in Amazon EC2 with OEM Agent, OEM, and OMS managing the databases

Using S3 File Gateway to back up an Oracle Database from an EC2 instance

This walkthrough covers using S3 File Gateway to back up an Oracle Database from an EC2 instance. The same solution applies to on-premises setups for both Storage Gateway and databases.

Provision S3 File Gateway
Security Group port requirements for S3 File Gateway activation
Create NFS share for the S3 bucket using AWS S3 File Gateway
Mount NFS share in the Oracle database EC2 instance
Configure RMAN channels and backup database

1. Provision S3 File Gateway

Follow the Storage Gateway documentation to set up S3 File Gateway in your AWS account. We perform the following changes to our setup to suit our large-scale backup use case consisting of hundreds of Oracle databases.

a. Create the S3 File Gateway EC2 instance using the latest Amazon Machine Image (AMI) name, aws-storage-gateway-FILE_S3-n.nn.n, and instance type, 8xlarge.

b. Add four EBS volumes to the instance with the following configuration:

- - EBS volume type -> gp3
  - EBS volume size -> 150 GB
  - EBS volume IOPS -> 4000
  - EBS volume throughput -> 800
  - Delete on termination -> Yes

c. These figures are provided as examples for illustrative purposes only. To determine your actual IOPS and throughput requirements, use the formulas outlined in the “Identify the S3 File Gateway configuration” section.

d. We deploy four S3 File Gateways, each running on separate EC2 instance with the identical configuration.

e. Add all four to one security group.

f. S3 File Gateway defaults to eight Amazon S3 uploader threads. We increased to 40 uploader threads in our test scenarios. Contact AWS Support to request increased Amazon S3 uploader threads for better performance and resource efficiency.

2. Security Group port requirements for S3 File Gateway activation

a. Follow the documentation for S3 File Gateway port requirements and add them to the Security Group.

b. Follow the documentation to activate Storage Gateway.

3. Create NFS share for the S3 bucket using Amazon S3 File Gateway

Follow the documentation to create an NFS share for the S3 bucket. Use one S3 bucket as the NFS share for all four Storage Gateways to enable simplified management, increased performance through parallel processing, improved scalability, efficient resource utilization, and a consistent data view across gateways. Create the NFS share with the default configuration and note the IP address. Use this IP to mount the S3 bucket as an NFS mount on Oracle Database EC2 instances. There are four IP addresses because we’re using four Storage Gateways.

Figure 2: NFS Share for the S3 bucket

Figure 2: NFS Share for the S3 bucket

Figure 2 shows the S3 File Gateway NFS Share interface, that contains section shows details like , (S3 Standard), and Encryption (S3-Managed Keys). settings include Client access for all NFS clients, read-write Access type, and Directory permissions. Example Commands are provided for connecting clients to the file share.

4. Mount NFS share in the Oracle database EC2 instance

Enter the following bash commands in the database EC2 instance and mount the S3 bucket as NFS mount points. The NFS parameters used are specific to the Oracle database and RMAN.

sudo su -
mkdir -p /mnt/s3-sgw-001
mkdir -p /mnt/s3-sgw-002
mkdir -p /mnt/s3-sgw-003
mkdir -p /mnt/s3-sgw-004
sudo mount -t nfs -o rw,rsize=1048576,wsize=1048576,hard,proto=tcp,timeo=600,nolock 10.1.0.234:/s3-mount-point /mnt/s3-sgw-001
sudo mount -t nfs -o rw,rsize=1048576,wsize=1048576,hard,proto=tcp,timeo=600,nolock 10.1.0.164:/s3-mount-point /mnt/s3-sgw-002
sudo mount -t nfs -o rw,rsize=1048576,wsize=1048576,hard,proto=tcp,timeo=600,nolock 10.1.0.221:/s3-mount-point /mnt/s3-sgw-003
sudo mount -t nfs -o rw,rsize=1048576,wsize=1048576,hard,proto=tcp,timeo=600,nolock 10.1.0.188:/s3-mount-point /mnt/s3-sgw-004

5. Configure RMAN channels and backup database

Use the script from GitHub to configure eight RMAN channels, and each NFS mount point should have two RMAN channels streaming backup pieces. Eventually all four NFS mount points are pointing to a single S3 bucket. While the RMAN setup addresses parallelism and channel allocation, it is crucial to also set an appropriate retention policy that aligns with your specific needs. Additionally, select a compression algorithm that meets your Recovery Time Objective (RTO) requirements.

Backup the database using the following RMAN command. The Section Size parameter is crucial for parallel data file backup and finding the best number that works for your database:

BACKUP DATABASE SECTION SIZE 5G PLUS ARCHIVELOG;

For the purpose of this test, we have used the command to take full database backup. However, in an actual production environment, you may execute a Level 0 and Level 1 incremental backup instead as shown below:

BACKUP INCREMENTAL LEVEL 0 SECTION SIZE 5G DATABASE PLUS ARCHIVELOG;
BACKUP INCREMENTAL LEVEL 1 SECTION SIZE 5G DATABASE PLUS ARCHIVELOG;

Amazon S3 File Gateway performance results

We performed load and performance testing of S3 File Gateways configured on six different EC2 instance types. The following are the test bed details:

Four Storage Gateways were used to copy backup data to a single S3 bucket
- Four cache disks per Storage Gateway provisioned with 4000 IOPS and 800 MB throughput per volume
12 Oracle database backups were run in parallel
S3 File Gateway Uploader threads increased from 8 to 40 (to increase uploader thread contact AWS Support)
Every database was backed-up 10 times in a loop equating to taking 120 full database backups
Each database backup was 160 GB, amounting to a total data backup size of 19.2 TB
A single S3 bucket was exposed as four NFS shares and mounted to an Oracle EC2 instance
Eight RMAN channels were used, with two channels streaming to each NFS mount point

In summary, we backed up 19.2 TB to a single S3 bucket using four Storage Gateways. The detailed metrics and the time taken to copy 19.2 TB are in the following table. All costs are based on a three-year reserved instance discount.

SGW EC2 Type	r6in.4xlarge	r6in.8xlarge	m7i-flex.4xlarge	m7i-flex.8xlarge	c7i.4xlarge	c7i.8xlarge
Number of Storage Gateway used	4	4	4	4	4	4
Amazon S3 uploader threads per Storage Gateway (increased from 8 to 40 by AWS Support)	40	40	40	40	40	40
Number of cache disks	4	4	4	4	4	4
Cache disk IOPS/volume	3000	4000	3000	3000	3000	3000
Cache disk throughput/volume (MB/s)	500	800	125	150	150	300
Cache disk volume size (GB)	150	150	500	500	500	150
Oracle database size (GB)	160	160	160	160	160	160
Amazon S3 upload achieved per Storage Gateway (MB/s)	900	1450	150	340	300	630
Time taken for backups to complete (minutes)	100	62	516	260	280	148
Achieved TB/hour per Storage Gateway	2.9 TiB	4.6 TiB	0.6 TiB	1.1 TiB	1 TiB	1.9 TiB
Cost for four SGW EC2 instance/month (USD)	2813.37$	4734.15$	1939.2$	2969.95$	1858.3$	2872.15$

Table 1 Configuration, performance, cost comparison of different Storage Gateway instance types

Monitor S3 File Gateway for optimal resource usage

Download the dashboard metrics file from GitHub. The README.md file contains instructions for modifying parameters, creating and configuring the CloudWatch dashboard, as well as monitoring and tuning it.

Figure 3 shows the CloudWatch dashboard showing optimal resource consumption. Using four r6in.8xlarge instances for Amazon Storage Gateway, 19.2 TiB was backed up in 62 minutes, averaging 4.645 TiB per Storage Gateway in 60 minutes.

Figure 3: CloudWatch dashboard for r6in.8xlarge resource usage

Figure 3: CloudWatch dashboard for r6in.8xlarge resource usage

Figure 3 shows a dashboard with multiple performance graphs for server metrics, including IOPS and throughput. Each graph shows data trends over time for different storage operations, with lines representing various metrics like read and write speeds.

Other considerations

We recommend turning dNFS OFF. RMAN creates backup pieces as hidden files and renames them later if Oracle dNFS is ON (Doc ID 2215563.1). This can negatively impact performance.
We recommend turning compression on. If you aren’t using compression, then consider BASIC compression (free). Test CPU overhead for compression. Compression can save significant storage capacity and network bandwidth.
RMAN restore is usually slower than RMAN backup.
For additional data protection options for your backups in Amazon S3, you can refer to this blog showcasing how one customer uses features such as S3 Versioning, S3 Replication and tiering to lower cost S3 storage classes to store and lock down a second copy in a different bucket.

Cleaning up

The services involved in this solution incur costs – clean up the example resources created while running this reference deployment to not incur additional cost. For additional information on how to clean up resources associated with AWS Storage Gateway, refer to this documentation.

Conclusion

In this post, we demonstrated using Amazon S3 File Gateway to back up hundreds of Oracle databases with terabytes of data to Amazon S3 for cost-effective and long-term storage. We provided perspective guidance on identifying the optimal S3 File Gateway configuration, like number of File Gateways and Cache disk configuration, based on total backup size and desired completion time. We also outlined how to create and configure CloudWatch dashboard to monitor S3 File Gateway performance to monitor and fine-tune configurations for optimal backup efficiency.

Overall, this post presents a cost-effective, scalable approach to long-term backup storage that can help meet performance, data protection, and compliance needs.

Thank you for reading this post. Leave any comments or questions in the comments section.

AWS Storage Blog