AWS Database Blog
Build high availability for Amazon RDS Custom for Oracle using read replicas
A high availability solution for the database stack is an important aspect to consider when migrating or deploying Oracle databases in the AWS Cloud to help ensure that the architecture can meet the service level agreement (SLA) of the application. Unavailability of a critical database can lead to application outage, interruption to business operations, inaccessibility to critical data, revenue loss, and loss of customer trust.
In this post, we discuss how to enable high availability for an Amazon Relational Database Service (Amazon RDS) Custom for Oracle instance using read replicas and best practices to follow during role transition and failover scenarios.
You can achieve high availability (HA) and disaster recovery (DR) for Oracle databases using a physical standby database by creating and maintaining a physical copy of the primary database. The standby instance can be hosted in a location different and far enough from the primary instance, such as a different Region or Availability Zone, to maintain the availability of the database when the primary instance is impacted by unplanned incidents. Oracle Data Guard provides a comprehensive set of services to create, maintain, and monitor one or more standby databases. Data Guard maintains a copy of the data in a standby database that is continuously updated with changes from the primary database. Data Guard validates the changes before they’re applied to the standby database, to help prevent physical corruptions that occur in the storage layer from causing data loss and downtime. Data Guard also helps repair corrupted blocks in the primary database in real time by fetching the “good” version of the corrupted block from an Active Data Guard physical standby database. Additionally, Data Guard can also help improve availability of the database during planned maintenance activities such as database patching using Standby First Patching strategy.
Amazon Relational Database Service (Amazon RDS) Custom is a managed database service for legacy, custom, and packaged applications and any application that requires access to the underlying OS and DB environment. RDS Custom for Oracle provides the flexibility to customize your database, underlying server, and operating system configurations to support applications that require such customizations. RDS Custom for Oracle supports read replicas that are built on Data Guard technology, and you can use these replicas to offload reads from the primary and for disaster recovery. As of this writing, RDS Custom for Oracle doesn’t support the Multi-AZ option, which is available for Amazon RDS for Oracle as an HA solution using storage replication.
Solution overview
With RDS Custom for Oracle, you can create up to five read replicas, which are managed physical standby databases using Data Guard configuration. These read replicas created in mounted mode can be opened in read-only mode and used for reporting purposes. Read replicas in RDS Custom use asynchronous log shipping mode with MaxPerformance Data Guard configuration, and are restricted to the same Region as the primary instance (as of this writing). Because these read replicas are managed by AWS, most of the heavy lifting activities in maintaining a typical Data Guard configuration are automated, including the following tasks:
- Creating and maintaining a Data Guard broker configuration
- Shipping redo changes from the primary instance to the standby instance (read replica)
- Resolving archivelog gaps
- Purging archivelogs from the primary and standby when they’re no longer needed
The following diagram illustrates a read replica in RDS Custom for Oracle.
However, this default configuration of read replicas doesn’t provide a Recovery Point Objective (RPO) of zero because the redo is shipped asynchronously from the primary instance to the replica. Furthermore, the failover operation is manual because there’s no automated failure detection or failover mechanism, and rebuilding the former primary instance after a failover operation needs to be done manually by creating a new read replica.
With shell access to the underlying Amazon Elastic Compute Cloud (Amazon EC2) instance of the RDS Custom for Oracle instance, the Data Guard configuration of the read replica can be customized to meet various customer-defined HA and DR requirements. The following are a few customizations you can make to the default read replica configuration to help you meet your Recovery Time Objective (RTO) and RPO requirements:
- Modify log shipping mode from asynchronous to synchronous to help you achieve an RPO of zero
- Enable Data Guard protection mode to maximum availability
- Configure a Data Guard observer process to enable Fast-Start Failover (FSFO) to facilitate automated failure detection and failover
- Enable the flashback database feature on both the primary and standby instances for automated and fast reinstatement of the former primary database after a failover activity
The following diagram illustrates a customized read replica configuration to help you meet your HA requirements.
In addition to these managed replicas, you can also create self-managed standby databases in various locations, such as a different AWS Region, on-premises, or a different cloud environment, to help you meet your disaster recovery requirements. You can manage those self-managed standby databases and managed read replicas with a single Data Guard configuration.
In this post, we discuss the following topics:
- Create a read replica for an existing RDS Custom for Oracle instance
- Change Data Guard protection mode
- Convert a replica from mounted state to open read-only
- Role transition (switchover)
- Data Guard failover
- Promote the read replica
- Configure FSFO
- Seamless Oracle Client connectivity
- Cross-Region Data Guard configuration
In this post, we refer to different resources such as Database Identifier in AWS Console, Database Unique Name (db_unique_name
instance parameter), Database name in Data Guard broker configuration and Net service name in tnsnames.ora
to illustrate a sample configuration. The following table serves as a reference to the names of different resources used in this post.
. | Primary | Read Replica |
DB Identifier | orcl1 | orcl2 |
DB Unique Name | ORCL_A | ORCL_B |
DB Name in Data Guard Broker | orcl_a | orcl_b |
Oracle Net Service Name | RDS_CUSTOM_ORCL_A | RDS_CUSTOM_ORCL_B |
Create a read replica for an existing RDS Custom for Oracle instance
You can create a read replica for an existing RDS Custom for Oracle instance either from the AWS Management Console or using the AWS Command Line Interface (AWS CLI).
To create a read replica from the Amazon RDS console, complete the following steps:
- On the Amazon RDS console, choose Databases in the navigation pane.
- Select the database you want to add a read replica for.
- On the Actions menu, choose Create read replica. Read replica is a copy of your source database, and automatically receives and applies changes received from source database.
- For Replica source, enter your source DB instance.
In this example, we use the existing RDS Custom for Oracle instanceorcl1
in the Singapore Region as the source to create our read replica. - For DB instance identifier, enter your DB instance (for this post,
orcl2
).
- Choose the database instance class necessary for the performance requirements of the replica instance.
- For Storage type, choose your storage type.
We recommend that you use the same storage type as the source DB instance for the read replica used as a failover target. - Accept the suggested values under Connectivity, and choose a different Availability Zone from the source database to protect from Availability Zone failure.
- For IAM instance profile, choose the instance profile for your RDS Custom for Oracle DB instance.
- Choose Create read replica to create the read replica
orcl2
.
Alternatively, you can create a read replica from the AWS CLI, as shown in the following example:
Choose your database protection mode
Oracle Data Guard provides three protection modes to configure your Data Guard environment based on your availability, protection, and performance requirements. The following table summarizes these three modes.
Protection Mode | Redo Transport Setting | Description |
MAXIMUM PERFORMANCE | ASYNC | For transactions happening on the primary database, redo data is asynchronously transmitted and written to the standby database redo log, therefore the performance impact is minimal.
|
MAXIMUM PROTECTION | SYNC+AFFRIM | For transactions on the primary database, redo data is synchronously transmitted and written to the standby database redo log on disk before the transaction is acknowledged. If the standby database becomes unavailable, the primary database shuts itself down to ensure transactions are protected. |
MAXIMUM AVAILABILITY | SYNC+AFFRIM | Similar to MaxProtection mode, except when no acknowledgement from standby database, it operates as if it were in MaxPerformance mode to preserve the primary database availability until it’s able to write its redo stream to a synchronized standby database again. |
SYNC+NOAFFRIM | For transactions on the primary database, redo is synchronously transmitted to the standby database, and the primary waits only for acknowledgement that the redo has been received on the standby, not that it has been written to standby disk. This mode, which is also known as FastSync , can provide a performance benefit at the expense of potential exposure to data loss in a special case of multiple simultaneous failures. |
Read replicas in RDS Custom for Oracle are created with maximum performance protection mode, which is also the default protection mode for Oracle Data Guard. It provides the lowest performance impact on the primary database, which can help you meet the RPO requirement measured in seconds. With FSFO FastStartFailoverLagLimit configuration, you can control the failover to the standby database based on the lag between primary and standby to further help you meet your RPO requirements.
To work to achieve a zero data loss (RPO=0) objective, you can customize the Data Guard protection mode to MaxAvailability
with the SYNC+NOAFFIRM
setting for redo transport for better performance. Because commits on the primary database are acknowledged only after the corresponding redo vectors are successfully transmitted to the standby database, the network latency between the primary instance and replica can be crucial for commit-sensitive workloads. Deploying the read replica in the same Availability Zone as the primary database provides lower network latency compared to a read replica in a different Availability Zone. However, the same Availability Zone read replicas may not meet your availability requirements because in the unlikely event of Availability Zone unavailability, both the primary instance and read replica instance are impacted. It’s a recommended practice to perform load testing for your workload to assess the performance impact when the read replica is customized to run in MaxAvailability
mode.
Follow these steps to change Data Guard protection mode to MaxAvailability
.
- Access the Data Guard broker configuration on Primary (
orcl1
) via the Data Guard Command Line Interface (DGMGRL).
You can find the net service name for your database in thetnsnames.ora
file located in the$ORACLE_HOME/network/admin
directory. RDS Custom for Oracle automatically populates these entries for your primary database and your read replicas. The password for theRDS_DATAGUARD
user is stored in AWS Secrets Manager, with secret namedo-not-delete-rds-custom-+<<RDS Resource ID>>+-dg
. Refer to Connecting to your RDS Custom DB instance using SSH for more information on how to connect to a RDS Custom EC2 instance using the SSH key retrieved from Secrets Manager. - Verify the current protection mode and log transport setting:
- Change log transport mode to
FastSync
, corresponding to the redo transport settingSYNC+NOAFFIRM
. Change it for both the primary database (orcl_a
) and standby database (orcl_b
) to check that you have valid settings after the role switch. - Change the protection mode to
MaxAvailability
:
Change replica status from mounted to read-only
Using a replica in read-only mode requires the Active Data Guard option which comes at an extra cost because it is a separately licensed feature of the Oracle database Enterprise Edition.
Having your read replica in read-only mode provides additional benefits:
- Offloads read-only workloads from the primary database
- Enables automatic repair of corrupted blocks by retrieving healthy blocks from the standby database using the Active Data Guard feature
- Uses the far sync capability to keep the remote standby database in sync without the performance overhead associated with long-distance redo log transmission
The read replica is created in MOUNT mode by default. Follow these steps to open it in read-only mode.
- Access the Data Guard broker configuration on Read Replica (
orcl2
) viaDGMGRL
. - Stop redo apply for the standby database
orcl_b
. - Connect to the standby database
orcl_b
using the TNS entryRDS_CUSTOM_ORCL_B
and open it in read-only mode. - Connect to Data Guard broker configuration on read replica using DGMGRL and enable redo apply.
Role transition (Data Guard switchover)
The primary database and one of the read replica databases can perform role transition. During this process, the databases in the Data Guard configuration continue to function in their new roles. Typical use cases for Data Guard switchover are disaster recovery drills, scheduled maintenance activities on databases, and Standby-First rolling patches.
Follow these steps to perform Data Guard switchover with a read replica.
- Pause RDS Custom database automation for both
orcl1
andorcl2
.
There might be database restarts during the switchover; this depends on the existing Data Guard role and database open mode. Refer to Pausing and resuming RDS Custom automation for steps on how to pause and resume RDS Custom database automation. Although the RDS Custom automation framework doesn’t interfere with the role transition process, it’s a good practice to pause automation during Data Guard switchover. - Use
DGMGRL
to log in to databaseorcl1
with userRDS_DATAGUARD
, and check the Data Guard status. - On the Amazon RDS console, check the instance role (primary and replica) in the database’s Replication section on the Connectivity & security tab.
The primary role should match the Data Guard primary database, and the replica role should match the Data Guard physical standby database. Note that the instance role shown on the Amazon RDS console is only updated correctly for read replicas managed by RDS Custom.
- Perform the switchover:
- Verify Data Guard status.
You might see
WARNING
instead ofSUCCESS
for the configuration status if you run the show configuration command right after switchover. In an ideal scenario, the configuration returns to a healthy state within a minute.The Amazon RDS console reflects the new roles after you perform the role switch. It may take a few minutes for the Replication state to update from empty to Replicating.
- Resume RDS Custom database automation for
orcl1
andorcl2
. - Change your application configuration to use the new primary database endpoint for the read/write workloads, or refer to the best practices discussed later in this post to avoid such changes.
Data Guard failover
During the Data Guard failover process, the standby database (read replica) is converted to a primary database when the original primary database fails. This is a manual process when the Fast-Start Failover (FSFO) is not configured and there may or may not be data loss depending upon on the Data Guard configuration. The failover process doesn’t remove the primary database from the Data Guard broker configuration and it allows the former primary database to be reinstated when the Flashback database feature is enabled.
If you want to test failover without having to restore from backup or re-create the replica after failover, enable the flashback database as explained in the following steps.
- Pause RDS Custom for Oracle database automation for databases
orcl1
andorcl2
, and disable FSFO if it has been enabled previously. - Log in to the Amazon RDS for Custom underlying EC2 instances running
orcl1
andorcl2
and create a directory using userrdsdb
. - Enable the flashback database on both databases while the database is in MOUNT state:
The code has the following parameters:
-
- dg_broker_start=false – Disables Data Guard broker automatic start to prevent the broker from automatically opening the database for the primary database and starting the managed recovery process for the standby database.
- db_recovery_file_dest – The Oracle database stores database flashback logs in the fast recovery area (FRA) defined by this parameter.
- db_flashback_retention_target – How far (in minutes) we need the database to rewind. The default value is 1440. To support our FSFO use case, 60 minutes usually is sufficient with maximum availability protection mode.
- db_recovery_file_dest_size – The size of the FRA. Use the column
ESTIMATED_FLASHBACK_SIZE
in the viewV$FLASHBACK_DATABASE_LOG
to estimate the flashback database log space requirement during the peak workload.
-
- Enable Data Guard broker automatic start to allow the broker process to automatically open the database for the primary database and start the managed recovery process for the standby database:
- When the process is complete, resume RDS Custom database automation for
orcl1
andorcl2
, and enable FSFO if it was disabled previously.
To test failover, or in the event of primary database failure, you can fail over to the read replica manually using the following steps:
- Pause RDS Custom database automation for
orcl2
, the read replica instance that acts as the target for failover. - Use DGMGRL to log in to database
orcl_b
with userRDS_DATAGUARD
, and check the Data Guard status. Make sure you log in to the target database, which is the failover target. - Start the failover:
With the IMMEDIATE option, no attempt is made to apply any unapplied redo that has been received. This helps in the event that the failover command fails with
Error: ORA-16472: failover failed due to data loss
. - Check the Data Guard status:
The Data Guard configuration is in a
WARNING
state because there’s no working physical standby for the new primary database. There are a number of ways to reinstate the standby database, including the following:- Use RMAN backup and restore or RMAN DUPLICATE from the active database.
- Use the Oracle flashback database option if enabled.
- On the Amazon RDS console, drop the old primary database (
orcl1
) and recreate a replica fororcl2
. Note that in this case, the unique DB name takes the next running sequence name if possible. It might becomeORCL_C
instead of reusing the existingORCL_A
.
- Resume RDS Custom database automation for
orcl2
. - Change your application configuration to use the new primary database endpoint for the read/write workloads, or refer to the best practices discussed later in this post to avoid such changes.
Promote the read replica
The read replica can be promoted to a standalone database that can support read/write workloads by choosing the Promote option on the Amazon RDS Console, the AWS CLI command promote-read-replica, or the PromoteReadReplica Amazon RDS API operation.
As part of this read replica promotion, Amazon RDS Custom for Oracle removes Data Guard configuration for orcl2
in both the primary (orcl1
) and the Read Replica (orcl2
). The promotion process completes even if the primary (orcl1
) is not in “Available” status. Amazon RDS Custom for Oracle removes Dataguard the configuration for orcl2 in the former primary (orcl1
) once it comes back to “Available” status.
To promote an RDS Custom for Oracle replica (orcl2
) using the Amazon RDS Console:
- On the Amazon RDS console, choose Databases in the navigation pane.
- Choose the RDS Custom for Oracle replica (
orcl2
) that you want to promote. - For Actions, choose Promote.
- On the Promote Oracle Replica page, enter the backup retention period and the backup window for the newly promoted DB instance, then choose Promote Read Replica.
After a few minutes, the role for orcl2
will change from Replica (RDS Custom) to Instance (RDS Custom). If orcl2
was the only replica for the Primary (orcl1
), the role for orcl1
will change from Primary (RDS Custom) to Instance (RDS Custom) too.
Alternatively, you can promote a read replica from the AWS CLI, as shown in the following example:
Configure Fast-Start Failover
FSFO monitors the health of instances in a Data Guard configuration using an observer process and automatically initiates failover to a standby database in the event of a primary instance failure. When there are multiple standby databases for the primary database, you can specify the list of databases as failover candidates via FastStartFailoverTarget configuration.
An observer process is a DGMGRL
client component, typically running in a different machine from the primary and standby databases. The ORACLE HOME installation for the observer can be an Oracle Client Administrator installation, or you can install either an Oracle Database Enterprise Edition or Personal Edition. Refer to Installing and Starting the Observer for observer installation for your database release. To provide high availability for the observer process, you might want to do the following:
- Enable EC2 instance auto-recovery for the EC2 instance running your observer. You need to automate the observer startup process as part of the OS startup.
- Additionally, deploy an observer in the EC2 instance and configure an Amazon EC2 Auto Scaling group with size one (1). In the event of EC2 instance failure, the Auto Scaling group automatically spins up another EC2 instance. You need to automate the observer startup process as part of the OS bootstrap and startup.
- For Oracle 12c Release 2 and above, you can deploy multiple (up to three) observers. One of them is the primary observer, and the rest are backup observers. When the primary observer fails, one of the backup observers resumes the primary role.
Follow these steps for configure FSFO for an RDS Custom for Oracle instance with read replicas:
- Use
DGMGRL
from the observer host, which is configured withtnsnames.ora
entries for the primary and standby databases:Although it’s a supported configuration to enable FSFO with maximum performance protection mode and to allow automatic failover as long as data loss is within the
FastStartFailoverLagLimit
configuration (value in seconds), you need to use maximum availability protection mode to work to achieve zero data loss (RPO=0). - Specify one or more standby databases as the failover target. In a configuration with only one standby database (read replica), this step is optional.
- Specify the FastStartFailoverThreshold value in seconds, which indicates the duration the observer process waits when the primary database becomes unavailable before initiating the failover.
The default value is 30 seconds in Oracle 19c, and the minimum value is 6 seconds. A lower value can potentially shorten the RTO during the failover, whereas a higher value helps reduce the chance of unnecessary failover transient errors on the primary database. The RDS Custom for Oracle automation framework monitors database health and performs corrective actions every few seconds. Therefore, we recommend setting FastStartFailoverThreshold to a value higher than 10 seconds.In the following example, configure the threshold value to 45 seconds: - Enable FSFO:
If your database doesn’t have flashback database enabled, the warning message ORA-16827 appears. Although the flashback database feature is optional for FSFO, it helps automatically reinstate failed primary databases using a flashback database to a point in time before failover if the FastStartFailoverAutoReinstate configuration property is set to
TRUE
(which is the default). - Check the configuration to confirm FSFO is enabled. The warning message is expected because you haven’t started the observer process.
- Start the observer.
You can start the observer before or after you enable FSFO. If FSFO is already enabled, the observer immediately begins monitoring the status and connections to the primary and target standby databases. If FSFO is not enabled, the observer only starts monitoring after FSFO is enabled.To avoid the observer stopping when the terminal running the command is closed, we recommend wrapping the command in a script and starting with the
nohup
command. From Oracle 12c Release 2 onwards, it’s possible to start an observer in the background. This feature uses Oracle Wallet to store the credentials to log in to target databases. In the following example,rds_custom_orcl_a
andrds_custom_orcl_b
are the keys for Oracle Wallet to retrieve the credentials for the observer. Because it’s going to run in background mode, you can press Enter to get theDGMGRL
command prompt after the start observer command is complete.The following are the quick steps to configure Oracle Wallet on the machine running your Oracle Data Guard observer. For complete steps, refer to Configuring a Client to Use the External Password Store.
The first step is to create Oracle Wallet and credentials. For this post,
rds_custom_orcl_a
is your connection identifier, andrds_dataguad
is the user name. Enter the password in the command when prompted.Add the following lines to
$ORACLE_HOME/network/admin/sqlnet.ora
: - Check the observer status:
- Check the Data Guard and FSFO status, and use
VERBOSE
to get detailed information about Data Guard configuration and status: - Optionally, perform a failover test by manually stopping the EC2 instance for
orcl1
. Before stopping the EC2 instance, use the tail command to monitor the observer log file.
Use DGMGRL
to log in to database orcl2
(with DB unique name orcl_b
) with user RDS_DATAGUARD
, and check the Data Guard status. It shows that orcl2
is our new primary database.
If the flashback database has been enabled on orcl1
prior to failover, after the former primary database returns online and starts in MOUNT
state, the observer reinstates it into a new standby database. The reinstated database acts as the FSFO target for the new primary database. The observer log is as follows:
Seamless Oracle Client connectivity
You can avoid application configuration changes during a role transition by using a TNS entry that contains both the primary and standby database endpoints in the configuration. You can define two role-based database services to support both read/write and read-only workloads. In the following example, orcl_rw
is the read/write service that’s active on the primary database. orcl_ro
is the read-only service, and is active on the standby database that has been opened in read-only mode.
Run the following SQL statements in the primary database to create these services:
Run the following SQL statement in the primary database to start the orcl_rw
service:
Run the following SQL statement in standby databases running in read-only mode to start the orcl_ro
service:
To automatically start these services while the database is restarted, create the following trigger in the primary database:
The following is a sample configuration for a read/write workload:
The following is a sample configuration for a read-only workload:
Refer to Oracle Data Guard Specific FAN and FCF Configuration Requirements for other Oracle client considerations to improve failover time during failover and switchover operations.
Cross-Region Data Guard configuration
To help you meet your disaster recovery requirements, you may need to deploy a standby database in a different Region or in an environment outside of AWS, such as on-premises or other cloud services. For such configurations, you can manually set up a self-managed physical standby database (managed replicas can’t be created outside your primary Region as of this writing).
However, managed replicas and self-managed standby databases can be managed by a single Data Guard broker configuration. The following architecture shows one such configuration for a primary RDS Custom for Oracle instance with a managed replica in the same Region and a cross-Region self-managed physical standby database.
Refer to Enabling High Availability with Data Guard on Amazon RDS Custom for Oracle for more details on creating an RDS Custom for Oracle standby database in a remote Region and maintaining it as a self-managed physical standby database for a primary RDS Custom for Oracle instance.
Refer to the Oracle Data Guard Broker guide and Data Guard Concepts and Administration for details on how to configure and manage a Data Guard environment with multiple standby databases.
Conclusion
Amazon RDS Custom for Oracle allows you to create read replicas that are built on Oracle Data Guard technology to offload read-only workloads from your primary instance. In this post, we discussed options to customize the default configuration of Oracle Data Guard deployed by a read replica to help you meet your specific RTO and RPO requirements for your critical applications hosted on RDS Custom for Oracle. We also reviewed best practices and step-by-step instructions to follow during role transition and failover scenarios.
If you have any comments or questions, leave them in the comment section.
About the author
Jobin Joseph is a Senior Database Specialist Solution Architect based in Dubai. With a focus on relational database engines, he assists customers in migrating and modernizing their database workloads to AWS. He is an Oracle Certified Master with 20 years of experience with Oracle databases.
Donghua Luo is a Senior RDS Specialist Solutions Architect. He works with AWS customers to migrate and achieve higher flexibility, scale, and resiliency with database services in AWS cloud.