PostgreSQL is one of the most widely used database engines and is supported by a very large and active community. It’s a viable open-source option to use compared to many commercial databases, which require heavy license costs. Amazon Relational Database Service (Amazon RDS) for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition are AWS managed offerings that take away the heavy lifting required for setting up the platform, configuring high availability, monitoring, and much more. This allows DBAs to spend more time on business-critical problems like doing schema design early on or query tuning.
In 2003, a new specification called SQL/MED (“SQL Management of External Data”) was added to the SQL standard. It’s a standardized way of handling access to remote objects from SQL databases. In 2011, PostgreSQL 9.1 was released with read-only support of this standard, and in 2013, write support was added with PostgreSQL 9.3. The implementation of this concept in PostgreSQL was called foreign data wrappers (FDW). Several FDWs are available that help connect PostgreSQL Server to different remote data stores, ranging from other SQL database engines to flat files. However, most FDWs are independent open-source projects implemented as Postgres extensions, and not officially supported by the PostgreSQL Global Development Group.
In this post, we discuss one FDW implementation that comes with PostgreSQL source as a contrib extension module called postgres_fdw. postgres_fdw allows you to implement federated query capability to interact with any remote PostgreSQL-based database, both managed and self-managed on Amazon Elastic Compute Cloud (Amazon EC2) or on premises. This is available in all present versions supported for Amazon RDS for PostgreSQL and Aurora PostgreSQL. The following diagram illustrates this architecture.
Important: The PostgreSQL Foreign Data Wrapper (FDW) does not officially support Amazon Redshift as a target database. As a result, certain functionalities might not work as expected or may encounter compatibility issues when attempting to connect to or query data in a Redshift instance. This post is intended for educational purposes only, and any implementation should be thoroughly tested in a non-production environment before considering it for use in a production setting.
Use cases
In this post, we primarily focus on two use cases to give an overview on the capability. However, you can easily extended the solution for other federated query use cases.
When working with independent software vendors (ISVs), we occasionally see them offering Amazon RDS for PostgreSQL and Aurora PostgreSQL in a multi-tenant setup in which they use one database per customer and a shared database within a single instance. Federated query capability implemented via FDW allows you to pull data from a shared database to other databases as needed.
An organization could have multiple systems catering to different departments. For example, a payroll database has employee salary information. This data maybe required by the HR and tax systems to calculate hike or decide tax incurred, respectively. One solution for such a problem is to copy the salary information in both the HR and Tax systems. However, this kind of duplication may lead to problems, like ensuring data accuracy, extra storage space incurred, or double writes. FDWs avoid duplication while providing access to required data that resides in a different foreign database. The following diagram illustrates this architecture.
Furthermore, in today’s world of purpose-built databases you host hot or active data in Amazon RDS for PostgreSQL and Aurora PostgreSQL, you have separate data warehouse solutions like Amazon Redshift for data archival. Without federated query support from an active database, you have to stream the active data to the data warehouse on an almost near-real-time basis to run analytical queries, which requires extra efforts and costs to set up a data pipeline and an additional overhead to the data warehouse. With federated query capability, you can derive insights while joining data at the transactional database from within itself as well as a data warehouse like Amazon Redshift. The following diagram illustrates this architecture.
For more information about querying data in your Aurora PostgreSQL or Amazon RDS for PostgreSQL remote server from Amazon Redshift as the primary database, see Querying data with federated queries in Amazon Redshift.
Prerequisites
Before getting started, make sure you have the following prerequisites:
- A primary Aurora PostgreSQL or Amazon RDS for PostgreSQL instance as your source machine.
- A remote PostgreSQL-based instance with information like username, password, and database name. You can use any of the following databases:
- Aurora PostgreSQL
- Amazon RDS for PostgreSQL
- Amazon Redshift
- Self-managed PostgreSQL on Amazon EC2
- On-premises PostgreSQL database
- Network connectivity between the primary and remote database. The remote database can be a different database within the same primary database instance, a separate database instance, Amazon Redshift cluster within the same or different Amazon Virtual Private Cloud (Amazon VPC), or even on-premises PostgreSQL-based database servers that can be reached using VPC peering or AWS managed VPN or AWS Direct Connect. The remote PostgreSQL-based database instance or Amazon Redshift cluster can be in a same or different AWS account with established network connectivity in place. For more information, see How can I troubleshoot connectivity to an Amazon RDS DB instance that uses a public or private subnet of a VPC?
- Tables that you query in the foreign (remote) server. For this post, we create one table in source database
pdfdwsource
and one table in target database pdfdwtarget
.
For pdfdwsource
, create a salary table with the columns emailid
and salary
with dummy content. See the following code:
[ec2-user@ip-172-31-15-24 ~]$ psql -h pgfdwsource.xxxx.us-west-2.rds.amazonaws.com -d pgfdwsource -U pgfdwsource -w
SET
Expanded display is on.
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help
pgfdwsource=> create table salary(emailid varchar, salary int);
CREATE TABLE
pgfdwsource=> insert into salary values('tom@customer1.com',10000);
INSERT 0 1
pgfdwsource=> insert into salary values('harry@customer1.com',100000);
INSERT 0 1
pgfdwsource=> insert into salary values('jeff@customer1.com',1000000);
INSERT 0 1
pgfdwsource=> \d salary
Table "public.salary"
Column | Type | Modifiers
---------+-------------------+-----------
emailid | character varying |
salary | integer |
pgfdwsource=> select * from salary;
emailid | salary
---------------------+---------
tom@customer1.com | 10000
harry@customer1.com | 100000
jeff@customer1.com | 1000000
For pdfdwtarget
, create a table corresponding to customer1
with the columns id
, name
, emailid
, projectname
, and contactnumber
with dummy content. See the following code:
[ec2-user@ip-172-31-15-24 ~]$ psql -h pgfdwtarget.xxxx.us-west-2.rds.amazonaws.com -d pgfdwtarget -U pgfdwtarget
SET
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
Type "help" for help.
pgfdwtarget=> create table customer1( id int, name varchar, emailid varchar, projectname varchar, contactnumber bigint);
CREATE TABLE
pgfdwtarget=> insert into customer1 values(1,'Tom','tom@customer1.com','Customer1.migration',328909432);
INSERT 0 1
pgfdwtarget=> insert into customer1 values(2,'Harry','harry@customer1.com','Customer1.etl',2328909432);
INSERT 0 1
pgfdwtarget=> insert into customer1 values(3,'Jeff','jeff@customer1.com','Customer1.infra',328909432);
INSERT 0 1
pgfdwtarget=> \d customer1
Table "public.customer1"
Column | Type | Modifiers
---------------+-------------------+-----------
id | integer |
name | character varying |
emailid | character varying |
projectname | character varying |
contactnumber | bigint |
pgfdwtarget=> select * from customer1;
id | name | emailid | projectname | contactnumber
----+-------+---------------------+---------------------+---------------
1 | Tom | tom@customer1.com | Customer1.migration | 328909432
2 | Harry | harry@customer1.com | Customer1.etl | 2328909432
3 | Jeff | jeff@customer1.com | Customer1.infra | 328909432
(3 rows)
In both tables, the column emailid
is common and can be used to derive insights into salary corresponding to customer1
employee information.
Now let’s see this in action.
Configuring your source instance, foreign server, user mapping, and foreign table
All the steps in this section are performed after logging in with the role pgfdwsource
into the primary database instance pgfdwsource
.
Connecting to the source instance
You connect to your source instance with a master user or via a normal user that has rds_superuser
permissions. You can use client tools like psql or pgAdmin. For more information, see Connecting to a DB instance running the PostgreSQL database engine. Create the extension postgres_fdw
with CREATE EXTENSION:
pgfdwsource=> \conninfo
You are connected to database "pgfdwsource" as user "pgfdwsource" on host "pgfdwsource.xxxx.us-west-2.rds.amazonaws.com" at port "5432".
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
pgfdwsource=> \du pgfdwsource
List of roles
Role name | Attributes | Member of
-------------+-------------------------------+-----------------
pgfdwsource | Create role, Create DB +| {rds_superuser}
| Password valid until infinity |
pgfdwsource=> create extension postgres_fdw;
CREATE EXTENSION
pgfdwsource=> \dx
List of installed extensions
Name | Version | Schema | Description
--------------+---------+------------+----------------------------------------------------
plpgsql | 1.0 | pg_catalog | PL/pgSQL procedural language
postgres_fdw | 1.0 | public | foreign-data wrapper for remote PostgreSQL servers
(2 rows)
Creating a foreign server
We use CREATE SERVER to create our foreign (remote) server mapping as the PostgreSQL-based server from which we pull the data. A foreign server typically encapsulates connection information that an FDW uses to access an external data resource. It uses the same connection options as libpq. SSLMODE
‘require
’ makes sure that the data is encrypted in transit. See the following code:
pgfdwsource=> create server my_fdw_target Foreign Data Wrapper postgres_fdw OPTIONS (DBNAME 'pgfdwtarget', HOST 'pgfdwtarget.xxxx.us-west-2.rds.amazonaws.com', SSLMODE 'require');
CREATE SERVER
If the master user (or user with rds_superuser
) is creating the foreign server, then other users need usage access to this server. See the following code:
GRANT USAGE ON FOREIGN SERVER my_fdw_target TO normal_user;
If you want to grant access to normal users to create a foreign server, you need to grant usage on the extension itself from the master user:
GRANT USAGE ON FOREIGN DATA WRAPPER postgres_fdw TO normal_user;
Creating user mapping
CREATE USER MAPPING defines a mapping of a user to a foreign server. In the following code, when the user pgfdwsource
is connecting to the server my_fdw_target
(remote database), they use the login information provided in the user mapping. Therefore, postgres_fdw
uses the user pgfdwtarget
to connect to the remote database.
pgfdwsource=> CREATE USER MAPPING FOR pgfdwsource SERVER my_fdw_target OPTIONS (user 'pgfdwtarget', password 'test1234');
CREATE USER MAPPING
Creating a foreign table
CREATE FOREIGN TABLE creates a foreign table in the source database that is mapped to the table in the foreign server. It creates a table reference in the local database of a table residing in the remote database. The name of this table reference is defined as the foreign table. For example, in the following code, foreign table customer1_fdw
points to the table customer1
, which resides in the remote database. Now, whenever you need to query data from table customer1
in the remote database, the user in local database queries using the foreign table customer1_fdw
.
pgfdwsource=> create foreign table customer1_fdw( id int, name varchar,
emailid varchar, projectname varchar, contactnumber bigint) server
my_fdw_target OPTIONS( TABLE_NAME 'customer1');
CREATE FOREIGN TABLE
Furthermore, the name of the foreign table or table reference can be different from the name of the table in the remote database. This flexibility can be useful when you want to mask the name of the table in remote database.
Running federated queries with a PostgreSQL FDW
FDWs (postgres_fdw
) allow you to query (SELECT, INSERT, UPDATE, and DELETE) a remote PostgreSQL-based database (Amazon RDS for PostgreSQL, Aurora PostgreSQL, or Amazon Redshift). For this post, we focus on the SELECT query functionality.
With the preceding steps complete, you can now query the table customer1
, which resides in the Amazon RDS for PostgreSQL or Aurora PostgreSQL instance pgfdwtarget.xxxx.us-west-2.rds.amazonaws.com
in the database pgfdwtarget
, from within the instance pgfdwsource.xxxx.us-west-2.rds.amazonaws.com
without the need for replication or any kind of pipeline. You can also run join queries to perform aggregates (and more) to derive insights. See the following code:
pgfdwsource=> \conninfo
You are connected to database "pgfdwsource" as user "pgfdwsource" on host "pgfdwsource.xxxx.us-west-2.rds.amazonaws.com" at port "5432".
SSL connection (protocol: TLSv1.2, cipher: ECDHE-RSA-AES256-GCM-SHA384, bits: 256, compression: off)
-- running select * on foreign table customer1 mapped to name customer1_fdw in source instance.
pgfdwsource=> select * from customer1_fdw;
id | name | emailid | projectname | contactnumber
----+-------+---------------------+---------------------+---------------
1 | Tom | tom@customer1.com | Customer1.migration | 328909432
2 | Harry | harry@customer1.com | Customer1.etl | 2328909432
3 | Jeff | jeff@customer1.com | Customer1.infra | 328909432
(3 rows)
-- running select * on foreign table customer1 mapped to name customer1_fdw in source instance while joining it with salary table to derive salary insights.
pgfdwsource=> select * from salary s inner join customer1_fdw c on s.emailid=c.emailid;
emailid | salary | id | name | emailid | projectname | contactnumber
---------------------+---------+----+-------+---------------------+---------------------+---------------
tom@customer1.com | 10000 | 1 | Tom | tom@customer1.com | Customer1.migration | 328909432
harry@customer1.com | 100000 | 2 | Harry | harry@customer1.com | Customer1.etl | 2328909432
jeff@customer1.com | 1000000 | 3 | Jeff | jeff@customer1.com | Customer1.infra | 328909432
(3 rows)
In the following code, only the data for a particular project is queried. The project name details are stored in the remote server. In the explain plan of the query, the WHERE
clause has been pushed down to the remote server. This makes sure that only the data required to realize the result of the query is retrieved from the remote server and not the entire table. This is one of the optimizations that postgres_fdw
offers.
pgfdwsource=> select * from salary s inner join customer1_fdw c on s.emailid=c.emailid where c.projectname like 'Customer1.etl';
emailid | salary | id | name | emailid | projectname | contactnumber
---------------------+--------+----+-------+---------------------+---------------+---------------
harry@customer1.com | 100000 | 2 | Harry | harry@customer1.com | Customer1.etl | 2328909432
(1 row)
pgfdwsource=> explain verbose select * from salary s inner join customer1_fdw c on s.emailid=c.emailid where c.projectname like 'Customer1.etl';
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=100.00..118.98 rows=1 width=144)
Output: s.emailid, s.salary, c.id, c.name, c.emailid, c.projectname, c.contactnumber
Join Filter: ((s.emailid)::text = (c.emailid)::text)
-> Seq Scan on public.salary s (cost=0.00..1.03 rows=3 width=36)
Output: s.emailid, s.salary
-> Materialize (cost=100.00..117.83 rows=3 width=108)
Output: c.id, c.name, c.emailid, c.projectname, c.contactnumber
-> Foreign Scan on public.customer1_fdw c (cost=100.00..117.81 rows=3 width=108)
Output: c.id, c.name, c.emailid, c.projectname, c.contactnumber
Remote SQL: SELECT id, name, emailid, projectname, contactnumber FROM public.customer1 WHERE ((projectname ~~ 'Customer1.etl'::text))
(10 rows)
You can pull data from a different database within the same instance or from a different instance, as seen in the preceding examples. You can also pull data from your PostgreSQL-based data warehouse like Amazon Redshift for a quick query on archived data. Because the data is pulled in real time from a foreign database, you can achieve this without needing to set up any kind of replication or reserving space in the source database.
Cleaning up
You can follow the steps in this section to clean up the resources created in previous steps:
- Drop the foreign table
customer1_fdw
:
pgfdwsource=> drop foreign table customer1_fdw;
DROP FOREIGN TABLE
- Drop the user mapping that maps the database user to the foreign server user:
pgfdwsource=> drop user mapping for pgfdwsource SERVER my_fdw_target;
DROP USER MAPPING
- Drop the foreign server definition that provides the local PostgreSQL server with the foreign (remote) server connection information:
pgfdwsource=> drop server my_fdw_target;
DROP SERVER
- Drop the extension postgres_fdw:
pgfdwsource=> drop extension postgres_fdw;
DROP EXTENSION
Common error messages
As is the case with any other setup, you may encounter some common issues while setting up postgres_fdw
, like connection problems or permission issues. The following section lists some common error messages that you may encounter and also suggests common culprits for those errors, which should point you in the right direction to mitigate them.
- Network connectivity – The following error message can appear when trying to query the foreign table customer1_fdw from the source database. A connection timed out error message appears when there is no network connectivity between the source and target databases. Common culprits are incorrectly configured security groups, Network Access Control List (NACL), or route table.
pgfdwsource=> select * from customer1_fdw;
ERROR: could not connect to server "my_fdw_target"
DETAIL: could not connect to server: Connection timed out
Is the server running on host "pgfdwtarget.xxxx.us-west-2.rds.amazonaws.com" (172.31.30.166) and accepting
TCP/IP connections on port 5432?
- DNS resolution – The following error message may appear when trying to query the foreign table
customer1_fdw
from the source database. This is a classic case of DNS resolution failure (issues in /etc/resolv.conf
) for the endpoint. The culprit is the incorrect endpoint entered while creating the foreign server.
pgfdwsource=> select * from customer1_fdw;
ERROR: could not connect to server "my_fdw_target"
DETAIL: could not translate host name "pgfdwtarget.xx.us-west-2.rds.amazonaws.com" to address: Name or service not known
- Table not present – In this scenario, the table
notpresent
was not in the target database instance. Therefore, the error message ERROR: relation "public.notpresent" does not exist
The common culprit is a non-existent or dropped table.
pgfdwsource=> create foreign table customer2_fdw( id int, name varchar, emailid varchar, projectname varchar, contactnumber bigint) server my_fdw_target OPTIONS( TABLE_NAME 'notpresent');
CREATE FOREIGN TABLE
pgfdwsource=> select * from customer2_fdw;
ERROR: relation "public.notpresent" does not exist
CONTEXT: remote SQL command: SELECT id, name, emailid, projectname, contactnumber FROM public.notpresent
- User permission – In this scenario, a permission denied error occurs when setting up user mapping, saying that you don’t have permission over the schema test. Common culprits are lack of select permission on the schema or table for the user specified in user mappings.
pgfdwsource=> create foreign table customer1_fdw( id int, name varchar, emailid varchar, projectname varchar, contactnumber bigint) server my_fdw_target OPTIONS( SCHEMA_NAME 'test', TABLE_NAME 'newtable');
CREATE FOREIGN TABLE
pgfdwsource=> select * from customer1_fdw;
ERROR: permission denied for schema test
Additional capabilities
In addition to SELECT, postgres_fdw
allows you to run UPDATE, INSERT, and DELETE on a foreign table. The following code updates the foreign table customer1_fdw
(note Foreign Update
in the query plan):
pgfdwsource=> explain update customer1_fdw set name='Jeff' where name='Jerry';
QUERY PLAN
-------------------------------------------------------------------------------
Update on customer1_fdw (cost=100.00..119.73 rows=4 width=114)
-> Foreign Update on customer1_fdw (cost=100.00..119.73 rows=4 width=114)
(2 rows)
pgfdwsource=> select * from customer1_fdw;
id | name | emailid | projectname | contactnumber
----+-------+---------------------+---------------------+---------------
1 | Tom | tom@customer1.com | Customer1.migration | 328909432
2 | Harry | harry@customer1.com | Customer1.etl | 2328909432
3 | Jerry | jeff@customer1.com | Customer1.infra | 328909432
(3 rows)
pgfdwsource=> update customer1_fdw set name='Jeff' where name='Jerry';
UPDATE 1
pgfdwsource=> select * from customer1_fdw;
id | name | emailid | projectname | contactnumber
----+-------+---------------------+---------------------+---------------
1 | Tom | tom@customer1.com | Customer1.migration | 328909432
2 | Harry | harry@customer1.com | Customer1.etl | 2328909432
3 | Jeff | jeff@customer1.com | Customer1.infra | 328909432
(3 rows)
postgres_fdw
come with some small optimizations, such as FDW maintaining the table schema locally. As a result, rather than doing a SELECT *
to pull all the data from a table in remote server, the FDW sends query WHERE
clauses to the remote server to run, and doesn’t retrieve table columns that aren’t needed for the current query. It can also push down joins, aggregates, sorts, and more.
postgres_fdw
is quite powerful; this post covers just the basic functionality. For more information about these optimizations, see Appendix F. Additional Supplied Modules.
Summary
This post illustrated what you can achieve with federated query capability and postgres_fdw
in Amazon RDS for PostgreSQL or Aurora PostgreSQL to query data from a PostgreSQL-based remote server, both managed and self-managed on Amazon EC2 or on premises.
We encourage you to use the postgres_fdw
extension of community PostgreSQL in your Amazon RDS for PostgreSQL and Aurora PostgreSQL environments to query data from multiple databases in remote PostgreSQL-based servers like Amazon Redshift, Amazon RDS for PostgreSQL, Aurora PostgreSQL, self-managed PostgreSQL servers on Amazon EC2 or on premises, or from the same server in real time without the need for setting up ongoing replication or a data pipeline.
In our next post, we’ll show you how to work with other FDW extensions that let you use your Amazon RDS for PostgreSQL database instance to query MySQL-compatible engines, Microsoft SQL Server and Sybase engines that support the tabular data stream (TDS) protocol, and Oracle databases
About the Authors
Vibhu Pareek is a Solutions Architect at AWS. He joined AWS in 2016 and specializes in providing guidance on cloud adoption through the implementation of well architected, repeatable patterns and solutions that drive customer innovation. He has keen interest in open source databases like PostgreSQL. In his free time, you’d find him spending hours on automobile reviews, enjoying a game of football or engaged in pretentious fancy cooking.
Gaurav Sahi is a Principal Solutions Architect based out of Bangalore, India and has rich experience across embedded, telecom, broadcast & digital video and cloud technologies. In addition to helping customers in their transformation journey to cloud, his current passion is to explore and learn AI/ML services. He likes to travel, enjoy local food and spend time with family and friends in his free time.