AWS Database Blog

Achieve one second or less downtime with the Advanced JDBC Wrapper Driver when upgrading Amazon RDS Multi-AZ DB Clusters

When upgrading minor versions of RDS Multi-AZ clusters the connections are switched from the current writer to a newly upgraded reader.

Clients have the option to connect to either the cluster writer endpoint or the cluster reader endpoint. Normally, they would connect to the writer endpoint. This endpoint is directed to the current writer instance. When a restart of the cluster occurs due to either a change in the parameters of the cluster or a failover, one of the readers is promoted to be the new writer. When this occurs, the cluster writer endpoint is updated to point to the new writer by updating the instance endpoint in the DNS. Due to inherent delays in DNS propagation, this can take up to 30 seconds to resolve. Normally, the application would require logic to reconnect after the temporary lack of availability of the database.

The Advanced JDBC Wrapper Driver can automatically detect and switch the connection to the new writer. By utilizing topology information inside the RDS Multi-AZ database, the driver can switch over the connection in around one second or less when there is no replica lag.

In the post Introducing the Advanced JDBC Wrapper Driver for Amazon Aurora, we showed you how to use the driver to handle the failover of an Amazon Aurora cluster. In this post, we show how to use the AWS Advanced JDBC Wrapper Driver to do a reduce downtime to one second or less when doing a minor version upgrade on either Multi-AZ Amazon Relational Database Service (Amazon RDS) for PostgreSQL or Multi-AZ Amazon RDS for MySQL with two readable standbys. We also discuss the new API introduced in Amazon RDS that enables this.

Prerequisites

In order to achieve one second or less downtime you need to provision an RDS for PostgreSQL Multi-AZ DB cluster running at least PostgreSQL 15.4 with rds_tools extension version 1.4 or above. If you have an existing cluster on this version, upgrade to at least the R3 release. Additionally, you need to install the rds_tools extension using the following DDL:

Using the psql command line tool.

CREATE EXTENSION rds_tools;

RDS for PostgreSQL cluster metadata

To enable rapid switchovers, we’ve implemented a topology function within Amazon RDS for PostgreSQL. To access this function, make sure you have installed the latest rds_tools extension, with a version of 1.4 or higher, and meet the supported engine version requirements as outlined in the prerequisites. After successfully installing the rds_tools extension, you can inspect topology metadata with the following command:

postgres=> SELECT * FROM RDS_TOOLS.SHOW_TOPOLOGY(‘<client_identifier>’);
              id               |                                     endpoint                            | port
-------------------------------+-------------------------------------------------------------------------+------
 db-IDENTIFIER-3 | somedb-instance-3.aaaaaaaaaaaa.us-west-2.rds.amazonaws.com | 5432
 db-IDENTIFIER-2 | somedb-instance-2.aaaaaaaaaaaa.us-west-2.rds.amazonaws.com | 5432
 db-IDENTIFIER-1 | somedb-instance-1.aaaaaaaaaaaa.us-west-2.rds.amazonaws.com | 5432
(3 rows)

Here’s a breakdown of each column:

  • id – Displays the DBI_RESOURCE_ID of each instance within an RDS Multi-AZ cluster
  • endpoint – This lists the Amazon Route 53 CNAME for each node in the cluster.
  • port – Displays the port numbers linked to each instance in the RDS Multi-AZ cluster

The parameter client_identifier in the show_topology function is optional but recommended. Passing a client identifier allows Amazon RDS to track which client libraries or proxies are actively querying the topology. This provides useful telemetry for Amazon RDS developers and helps guide support for additional clients in the future.

Additionally, the rds_tools extension provides the function rds_tools.multi_az_db_cluster_source_dbi_resource_id() to indicate the resource ID of the writer.

The Advanced JDBC Wrapper Driver uses this topology information to connect directly to all three instances within an RDS Multi-AZ cluster.

The new topology information is available in Amazon RDS for PostgreSQL versions 15.4, 14.9, 13.12, or higher. If you have an existing cluster on these versions, upgrade to at least the R3 release.

RDS for MySQL cluster metadata

Similar to Amazon RDS for PostgreSQL, there is a table in the mysql database called rds_topology with the same columns: id, endpoint and port.

The following SQL can be used to inspect the topology:

SELECT * FROM MYSQL.RDS_TOPOLOGY;

Using the Advanced JDBC Wrapper Driver in your application

The maven coordinates are as follows:

<dependency>
  <groupId>software.amazon.jdbc</groupId>
  <artifactId>aws-advanced-jdbc-wrapper</artifactId>
  <version>LATEST</version>
</dependency>

Include them as a dependency to your project. You will also need the MySQL JDBC driver or the PostgreSQL JDBC driver.

Using the driver is straightforward. The connection schema for Amazon RDS for PostgreSQL is as follows:

jdbc:aws-wrapper:postgresql://host/database
Connection conn = DriverManager.getConnection("jdbc:aws-wrapper:postgresql://host/database", props);

The connection schema for Amazon RDS for MySQL is as follows:

jdbc:aws-wrapper:mysql://host/database
Connection conn = DriverManager.getConnection("jdbc:aws-wrapper:mysql://host/database", props);

Some changes in your code are required to handle three conditions:

  • FailoverFailed – The connection to the new writer could not be established and the application will have to open a new connection
  • FailoverSuccess – The connection was reestablished and the application will have to set any session state that was added after the connection was opened
  • TransactionStateUnknown – The application was in the middle of a transaction when the failover occurred and the connection was reestablished but the transaction will have to be retried

There are many examples in our github repository on how to use the driver. This example shows how to connect and handle the exceptions.

It is important to note that when setting up the connection you modify the failoverClusterTopologyRefreshRate. This setting determines the rate at which the driver reads the topology, by default it is 2000ms. Setting it to 100ms is required to ensure discovery of the new writer as soon as possible

props.setProperty("failoverClusterTopologyRefreshRateMs", 100));

You can use the following code to test the fast switchover capabilities of the driver:

package software.amazon.demo;

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.Properties;
import java.util.logging.Logger;

import software.amazon.jdbc.plugin.failover.FailoverSuccessSQLException;

public class Main {
    private static final Logger LOGGER = Logger.getLogger(Main.class.getName());
    final Properties properties = new Properties();
    properties.setProperty("user", "test_user");
    properties.setProperty("password", "test_password");
    // Refresh the topology every 100ms during failover so it finds new writer faster
    properties.setProperty("failoverClusterTopologyRefreshRateMs", "100");
    properties.setProperty("loginTimeout", "1");
    // Enable failover plugin
    properties.setProperty("wrapperPlugins", "failover"); 
  
    // Use "jdbc:aws-wrapper:postgresql://" for RDS Multi-AZ Postgres Cluster
    String connString = "jdbc:aws-wrapper:mysql://<cluster-endpoint>:3306/mysql";
    Connection conn = DriverManager.getConnection(connString, properties);
    } catch (FailoverSuccessSQLException e) {

    while (true) {
      try (Statement stmt = conn.createStatement()) {
      	  stmt.executeQuery("SELECT 1");
  LOGGER.info("[OK] Successfully executed query.");
} catch (FailoverSuccessSQLException e) {
   /* 
    Query execution failed, but the JDBC wrapper successfully failed over to a new     elected writer node
    */
    try (Statement stmt = conn.createStatement()) {
        stmt.executeQuery("SELECT 1");
    }
    LOGGER.info("[Failover Complete] Connected to the new writer node.");
    break;
} 
    } catch (Exception e) {
       LOGGER.warning(e.toString());
    }
  }

Let’s see how this code works. Below is an example of how to run the test and the output of the logs. The downtime is around 300ms in this case.

  1. Create a test cluster using the following AWS CLI command:
     aws rds create-db-cluster --db-cluster-identifier my-app-db --engine mysql --engine-version 8.0.28 --db-cluster-instance-class db.m5d.large --storage-type io1 --master-username test_user --master-user-password test_password --iops 1000 --allocated-storage 100`
  2. Configure the properties with actual database user and password.
  3. Replace connString with actual cluster endpoint.
    Run the demo application above in an EC2 instance within the same VPC as RDS MultiAZ DB cluster
  4. While the demo application is running, perform DB engine minor version upgrade via AWS console or AWS CLI
  5. The demo application automatically reconnects to the new writer with downtime less than one second. For this test, the switchover took 331ms which you can see in the following logs:
    Nov 08, 2023 11:32:27.348 PM software.amazon.demo.Main main 
    INFO: [OK] Successfully executed query. 
    Nov 08, 2023 11:45:41.706 PM software.amazon.demo.Main main 
    INFO: [OK] Successfully executed query. 
    Nov 08, 2023 11:45:41.708 PM software.amazon.demo.Main main 
    INFO: [OK] Successfully executed query. 
    Nov 08, 2023 11:45:41.855 PM software.amazon.jdbc.hostlistprovider.RdsMultiAzDbClusterListProvider processTopologyQueryResults 
    SEVERE: The topology query returned an invalid topology - no writer instance detected. 
    Nov 08, 2023 11:45:42.036 PM software.amazon.jdbc.plugin.failover.FailoverConnectionPlugin failover 
    SEVERE: The active SQL connection has changed due to a connection failure. Please re-configure session state if required. 
    Nov 08, 2023 11:45:42.039 PM software.amazon.demo.Main main 
    INFO: [Failover Complete] Connected to the new writer node.

Clean Up

Make sure you clean up your Multi-AZ DB Cluster you created if you do not plan on using them in the future.

Conclusion

Using the improved logic in the Advanced JDBC Wrapper Driver, you can improve the availability of writes from your application while performing a minor version upgrade on an RDS Multi-AZ cluster without having to worry about DNS propagation delays

Add this to your project using the Maven coordinates above and integrate the code into your project(s). Let us know how it works in the comments below.


About the Author

Dave Cramer is a Senior Software Engineer for Amazon Web Services. He is also a major contributor to PostgreSQL as the maintainer of the PostgreSQL JDBC driver. His passion is client interfaces and working with clients