AWS Database Blog

Querying and writing to MySQL and MariaDB from Amazon Aurora and Amazon RDS for PostgreSQL using the mysql_fdw extension, Part 2: Handling foreign objects

In Part 1 of this series we discussed dealing with multiple database engines or schemas that share data with each other, you’re faced with the difficulty of establishing cross-engine database connectivity. To overcome this challenge, PostgreSQL offers foreign data wrappers (FDW), which are libraries that facilitate communication with external data sources by abstracting the complexities of data source connection and data retrieval.

The PostgreSQL mysql_fdw extension is a FDW implementation for MySQL, facilitating direct connectivity between a PostgreSQL database and a MySQL or a MariaDB database. The mysql_fdw extension is supported for Amazon Aurora PostgreSQL-Compatible Edition version 15.4, 14.9, 13.12, 12.16 and higher releases and Amazon RDS for PostgreSQL version 14.2, 13.6 and higher releases.

In this post, we focus on working with the features of mysql_fdw PostgreSQL extension on Amazon RDS for PostgreSQL to help manage a large set of data that on an external database scenarios. It enables you to interact with your MySQL database for importing individual/large/selectively number of objects at the schema level and simplifying how we get information about the MySQL/MariaDB schema, to make it easier to ultimately read/write data. We will also provide an introduction to understand query performance on foreign tables.

The examples in this post also work for Amazon Aurora PostgreSQL-compatible edition.

Prerequisites

To work with features discussed in this post, you must create the mysql_fdw extension in a PostgreSQL database by following the instructions provided in Part 1 of this series.

You also need the following resources:

Bulk loading foreign tables

In Part 1, we learned a simple way of creating a foreign table by running the following DDL template:

CREATE FOREIGN TABLE 
	<Local Name for MySQL Table> (<Field1> <DataType>,<Field2> <Datatype >...) 
SERVER  <Name of the MySQL server>
OPTIONS (dbname '<MySQLDB Name>', table_name '<TableName in MySQL>'); 

The following example creates a foreign table named ft_mysql_offices in PostgreSQL using the MySQL source table mysql_offices:

CREATE FOREIGN TABLE 
	ft_mysql_offices (officecode varchar(10)
	, city varchar(10)
	, phone varchar(10)
	, addressLine1 varchar(50)
	, addressLine2 varchar(50)
	, state varchar(50)
	, country varchar(50)
	, postalcode varchar(15)
	, territory varchar(10) ) 
SERVER mysql_server 
OPTIONS (dbname 'mysql_classicmodels', table_name 'mysql_offices');

CREATE FOREIGN TABLE 

Creating foreign tables individually can be challenging due to the need to map data types and column collation matching between the target. Failure to ensure this compatibility may lead to semantic anomalies, as the remote server might interpret query conditions differently from the local server, resulting in unexpected or incorrect results. mysql_fdw provides a mechanism to import all or a subset of tables of a schema into PostgreSQL with the IMPORT FOREIGN SCHEMA command. The following examples discuss bulk import of objects:

  • Import objects of an entire schema – To import an entire schema from the remote server with hundreds of objects, you use the mysql_fdw feature IMPORT FOREIGN SCHEMA. The following example imports all the tables in the hrdb schema of MySQL into the public schema in PostgreSQL:
IMPORT FOREIGN SCHEMA hrdb 
	FROM SERVER mysql_server 
	INTO public;
  • Selectively import tables in a schema – To import a subset of tables of a schema from the remote server to the local database, mysql_fdw supports the selective import of specific tables into the local database. The following example imports the tables emp and dept in the hrdb schema of MySQL into the public schema of the PostgreSQL database:
IMPORT FOREIGN SCHEMA hrdb 
	LIMIT TO  ("EMP","DEPT") 
	FROM SERVER mysql_server 
	INTO public;
  • Import a table into multiple target schemas – You can import objects into multiple target schemas on the PostgreSQL database end. Importing tables into multiple schemas may help restrict user access to different PostgreSQL schemas. The following example creates the foreign tables emp and dept in two different schemas, payroll and finance, in a PostgreSQL database by using the IMPORT FOREIGN SCHEMA command:
CREATE SCHEMA payroll;
CREATE SCHEMA
 
CREATE SCHEMA finance;
CREATE SCHEMA
  
IMPORT FOREIGN SCHEMA  
		hrdb 
	LIMIT TO ("EMP","DEPT") 
	FROM SERVER mysql_server 
	INTO payroll;
IMPORT FOREIGN SCHEMA
 
IMPORT FOREIGN SCHEMA  hrdb 
	LIMIT TO ("EMP","DEPT") 
	FROM SERVER mysql_server 
	INTO finance;
IMPORT FOREIGN SCHEMA

PostgreSQL extensions are database specific. If you have more than one database inside a PostgreSQL instance or cluster, you must create a separate mysql_fdw extension, foreign server definition, and user mapping.

Create foreign tables with a subset of columns

A common requirement is to create a foreign table with a subset of columns of a table with a large number of columns from a remote database, as this allows you to access specific column data from remote MySQL/MariaDB sources without importing the entire table. This approach is useful for focusing on relevant information and potentially improving query performance by having only required column data. In such cases, you can create foreign tables with only the required columns. For example, the following table has eight columns on the MySQL end:

CREATE TABLE films (
	film_id 		char(5) NOT NULL
	,title       		varchar(40) NOT NULL
	,did         		integer NOT NULL
	,date_prod   	date
	,kind        		varchar(10)
	,description 	varchar(100)
	,release_year	smallint
	,len         		interval hour to minute
);

INSERT INTO films (`film_id`, `title`, `did`, `date_prod`, `kind`, `description`, `release_year`, `len`) VALUES
(1001, 'The Lord of the Rings', 101, '2003-09-05', 'Fantasy', 'Destroy ring to defeat the dark lord', 100),
(1002, 'Spirited Away', 102, '2001-03-11', 'Fantasy', 'A young girl is transported to the spirit world navigates challenges', 110),
(1003, 'Pulp Fiction', 103, '1994-10-15', 'Thriller', 'Interconnected stories told in non-linear fashion', 108);

Let’s assume that you don’t need all of these fields in your PostgreSQL database and its application. In such cases, you can create a foreign table with only the necessary columns on the PostgreSQL side:

CREATE FOREIGN TABLE films (
	 film_id char(5) NOT NULL
	,date_prod   date
	,kind   varchar(10),
)
SERVER mysql_server;

The following query on a PostgreSQL foreign table retrieves records from the target MySql table, selecting only the three columns that exist locally within the PostgreSQL database.

select * from films;
 film_id | date_prod  |   kind
---------+------------+----------
 1001    | 2003-09-05 | Fantasy
 1002    | 2001-03-11 | Fantasy
 1003    | 1994-10-15 | Thriller

Create foreign tables on top of views

For data aggregation purposes or for performance reasons, you can create views in an RDS for MySQL database. The mysql_fdw extension supports creating foreign tables on top of such views:

The following code snippet creates a view on a table films with subset of columns when run on MySQL database.

CREATE VIEW v_film 
	AS SELCT film_id, title, description, release_year 
	FROM films;

The view created in MySQL database can be used as source to create a foreign table in target PostgreSQL database. The following example creates the foreign table v_film in the PostgreSQL database when executed in target PostgreSQL database system:

CREATE FOREIGN TABLE v_film (
	 film_id smallint
	,title varchar(255) NOT NULL
	,description text,
	release_year smallint ) 
	SERVER mysql_server 
	OPTIONS (dbname mysql_classicmodels, table_name 'v_film');
CREATE FOREIGN TABLE

Create views on top of foreign tables

Views allow you to transform or reshape the data from the foreign tables without modifying the underlying data. You can perform calculations, combine data from multiple foreign tables, or apply complex logic within the view definition. You can also create views on top of foreign tables in the PostgreSQL database. The foreign table can even be a view on the MySQL end. The following example creates a view using the newly created foreign table v_film:

CREATE VIEW v2_film 
	AS SELECT film_id, title 
	FROM v_film;

EXPLAIN VERBOSE SELECT *  FROM v2_film;

QUERY PLAN
--------------------------------------------------------------------------
Foreign Scan on public.v2_film  (cost=10.00..1010.00 rows=1000 width=518)
Output: v2_film.film_id, v2_film.title
Local server startup cost: 10
Remote query: SELECT `film_id`, `title` FROM `mysql_classicmodels`.`v_film`
(4 rows)

Writing data to a remote MySQL / MariaDB database

After you create the foreign tables in the PostgreSQL database , you can start using the foreign tables as local tables and perform Data Modification Language (DML) operations. This enables you to perform cross-database operations and data integration between PostgreSQL and MariaDB/MySQL databases. Below are few of the use cases:

  • Using a single interface to manage data residing stored in both PostgreSQL and MySQL
  • Easier migration from MySQL/MariaDB to PostgreSQL
  • Synchronizing data without complex ETL processes
  • Near real-time updates on MySQL data from PostgreSQL applications
  • Cross-database queries and analytics

For example, here’s a set of queries that demonstrates how we can update data in a MySQL table using the capabilities of the mysql_fdw extension in PostgreSQL:

SELECT * FROM ft_test ;  
0

INSERT INTO ft_test VALUES(2,'abc');  
INSERT 0 1  

INSERT INTO ft_test SELECT generate_series(3,100),'abc';  
INSERT 0 98  

SELECT count(*) FROM ft_test ;  
 count   
-------  
    99  
(1 row)

Writing to a foreign table (INSERT, UPDATE, or DELETE) requires the first column of the table to be unique. Otherwise the following error, is reported:

ERROR:  first column of remote table must be unique for INSERT/UPDATE/DELETE operation.  

To address this, add a primary key or unique key on the first column of the table in the MySQL database.

Materialize foreign tables

Materializing foreign tables is a technique used to improve query performance and reduce network overhead. One of the key features that mysql_fdw is the ability to support persistent connections to remove MariaDB/MySQL databases. After a query runs, the connection to the remote RDS for MySQL database is not dropped. Instead, it retains the connection for the next query from the same session. If you need to access data from the foreign tables frequently, avoid complex queries running on the remote server, and to reduce network overhead, you can create local materialized views:

CREATE MATERIALIZED VIEW mv_film 
	AS SELECT * FROM film;
SELECT 1000

Currently, materialized views in PostgreSQL aren’t updated when data in the base tables change. To store updates in your local materialized view, you will need to run a refresh command explicitly:

REFRESH MATERIALIZED VIEW mv_film;
REFRESH MATERIALIZED VIEW

Understanding query performance when executing a remote MySQL query

You can use the PostgreSQL EXPLAIN and EXPLAIN ANALYZE commands to review your query’s performance. However, troubleshooting the performance of queries on foreign tables is challenging because queries run in the remote source system unless the data is materialized in the local database.

The following query plan shows the scan sent by the RDS for PostgreSQL instance to query data from the MySQL database. The keyword Foreign Scan indicates that a foreign table is involved in the query. After the source is identified, you must check the remote server for the query plan.

EXPLAIN VERBOSE SELECT COUNT(*) FROM ft_test ;  
                                  QUERY PLAN                                     
-------------------------------------------------------------------------------  
 Aggregate  (cost=1027.50..1027.51 rows=1 width=8)  
   Output: count(*)  
   ->  Foreign Scan on public.ft_test  (cost=25.00..1025.00 rows=1000 width=0)  
         Output: id, info  
         Remote server startup cost: 25  
         Remote query: SELECT NULL FROM `test123`.`test`  
(6 rows)  

EXPLAIN VERBOSE SELECT id FROM ft_test WHERE id=2;  
                               QUERY PLAN                                  
-------------------------------------------------------------------------  
 Foreign Scan on public.ft_test  (cost=25.00..1025.00 rows=1000 width=4)  
   Output: id  
   Remote server startup cost: 25  
   Remote query: SELECT `id` FROM `test123`.`test` WHERE ((`id` = 2))  
(4 rows)

The PostgreSQL EXPLAIN ANALYZE plan shows only the query sent to the RDS for MySQL database with a foreign scan and a remote query. If the cost of the foreign scan is high, monitor the remote RDS for MySQL database resource utilization metrics including CloudWatch metrics, Performance Insights, check your RDS events that would affect the performance. Work with query optimization techniques on the remote MySQL instance along with improving query performance for RDS for MySQL with RDS optimized reads.

Bulk cleanup of foreign objects

The foreign data wrapper framework has the ability to clean up foreign objects with a single command using the CASCADE feature. This is useful when you set up foreign objects for a temporary purpose, like data migration. PostgreSQL will walk through the dependencies and drop all foreign objects when you run the DROP EXTENSION command with the CASCADE keyword.

The following example drops the extension mysql_fdw from the PostgreSQL database. Dropping the extension also removes the user mappings and the foreign objects that depend on the extension.

DROP EXTENSION mysql_fdw CASCADE;

Clean up

The resources used in this solution will incur costs as long as they are running. You can perform a cleanup to free up these resources:

  1. Drop the foreign tables, views, and materialized views using the following commands:
DROP VIEW <SchemaName.ViewName>;
DROP MATERIALIZED VIEW <SchemaName.ViewName>;
DROP FOREIGN TABLE  <SchemaName.TableName>;
  1. Drop the schemas created for testing. The following command deletes the finance and payroll schemas:
DROP SCHEMA finance, payroll;
  1. Drop the user mappings and foreign server:
DROP USER MAPPING FOR pguser1 SERVER mysql_server;
DROP SERVER mysql_server;
  1. Delete the RDS for PostgreSQL instance if you created one for this post.
  2. Stop your EC2 instance for MySQL if you created one for this post.

If you want to continue using this test environment, it’s recommended to create a billing alarm to help you stay within your budget.

Conclusion

PostgreSQL foreign data wrappers present a solution to connect to and query MySQL databases as if they were local tables from PostgreSQL databases that offers data integration, real-time data access, reduced data redundancy, leveraging PostgreSQL features between cross-databases. This also helpful in database migrations that support gradual or a phased migration offering coexisting data in both the environments for testing, data comparison, validation results in downtime for data migrations. In this post, we demonstrated different ways to create and manage foreign tables, understand the performance of queries involving foreign tables, and materialize the foreign tables for performance reasons in an RDS for PostgreSQL database.

Leave a comment if you have questions or feedback.


About the authors

Rohit Suraj Kapoor is a Lead Consultant – Database with AWS Professional Services in India, helping customers have a successful cloud adoption and migration journey. Throughout his career, Rohit has been dedicated to assisting customers with database-related challenges, including migration, design, and performance optimization.

P G Kumar Babu is a Senior Consultant with AWS ProServe, Global Competency Centre. His background spans a wide depth and breadth of expertise and experience in SQL and NoSQL database technologies. He is a Database Migration Expert and has developed many successful database solutions addressing challenging business requirements for moving databases from on premises to Amazon RDS and Amazon Aurora PostgreSQL and MySQL.