AWS Database Blog

Automate PostgreSQL log exports to Amazon S3 using extensions

Do you want to copy or archive your Amazon Relational Database Service (Amazon RDS) for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition logs directly to Amazon Simple Storage Service (Amazon S3)? Does your organization have the regulatory requirements to audit all the DDL or DML activity against your RDS for PostgreSQL database?

With the addition of the pg_cron extension for Amazon RDS for PostgreSQL (version 12.5 and higher) and for Amazon Aurora for PostgreSQL (version 12.6+), you can now automate exports of PostgreSQL engine logs directly to Amazon S3 by using the PostgreSQL extensions log_fdw, aws_s3, and pg_cron. With these extensions, you can copy and archive the PostgreSQL database logs directly over to Amazon S3.

In this post, we show you how you can use the PostgreSQL extensions and automate the export of PostgreSQL logs directly to Amazon S3. We also show you an example of how you can filter the engine logs and export only the DML statements to Amazon S3.

Prerequisites

For this post, we use an Amazon RDS for PostgreSQL instance. Before you get started, make sure you complete the following prerequisites.

  1. Create or have access to an AWS account
  2. Install and configure the AWS Command Line Interface (AWS CLI)
  3. Create an RDS for PostgreSQL instance (version 12.5 or higher). For instructions on provisioning an instance, see Create and Connect to a PostgreSQL Database with Amazon RDS
  4. Launch an Amazon Elastic Compute Cloud (Amazon EC2) instance to access the database instance with psql client installed and configured
  5. Create an S3 bucket where we store the logs exported from the database
  6. Create the required AWS Identity and Access Management(IAM) roles and policies and attach the role to your RDS for PostgreSQL instance to provide access to the S3 bucket
  7. Create the following PostgreSQL extensions in your RDS for PostgreSQL database
    1. pg_cron – To enable the pg_cron extension, modify the database parameter group and add value pg_cron to the shared_preload_libraries parameter. Restart the database
    2. log_fdw – We use the log_fdw extension to load all the available RDS for PostgreSQL or Aurora PostgreSQL DB log files as a table
    3. aws_s3 – With the aws_s3 extension, you can query data from your RDS for PostgreSQL DB instance and export it directly into files stored in an S3 bucket. This requires you to create an S3 bucket and IAM role, and attach the role to the DB instance

Solution overview

To start, we first create the necessary resources.

  1. Create custom parameter groups and allow extension support
  2. Create an IAM role and policy, and attach the role to your RDS for PostgreSQL instance

Then we’re ready to implement the solution.

  1. We use the log_fdw extension to import the PostgreSQL logs into our RDS for PostgreSQL database. These logs are stored in a table
  2. Next, we export these logs to an S3 bucket using the extension aws_s3
  3. To automate the whole process, we use the pg_cron extension and schedule a job that sends these logs to Amazon S3 periodically

The following diagram illustrates the solution architecture.

Let’s look into the details of each step of this solution.

Create custom parameter groups and allow extension support

Create and associate a parameter group for your RDS for PostgreSQL instance (log-stream-demo-instance in this post) using the AWS CLI with the following code.

aws rds create-db-parameter-group --db-parameter-group-name log-exporting-pg --db-parameter-group-family postgres12 --description 'Parameter group for allowing automated log exporting to s3 from RDS'
    {
        "DBParameterGroup": {
            "DBParameterGroupArn":
            "arn:aws:rds:us-east-1:<AWS-ACCOUNT-ID>:pg:log-exporting-pg",
            "DBParameterGroupName": "log-exporting-pg",
            "DBParameterGroupFamily": "postgres12",
            "Description": "Parameter group for allowing automated log
             exporting to s3 from RDS"
        }
    }

aws rds modify-db-instance --db-instance-identifier log-stream-demo-instance --db-parameter-group-name log-exporting-pg 
    "DBParameterGroups": [
        {
            "DBParameterGroupName": "log-exporting-pg",
            "ParameterApplyStatus": "applying"
        }
    ]

Modify the parameter group to support pg_cron and log_fdw.

aws rds modify-db-parameter-group --db-parameter-group-name log-exporting-pg --parameters "ParameterName=shared_preload_libraries,ParameterValue='pg_cron',ApplyMethod=pending-reboot" --parameters "ParameterName=log_destination,ParameterValue='csvlog',ApplyMethod=pending-reboot"
    {
        "DBParameterGroupName": "log-exporting-pg"
    }

Restart the database to allow the parameter group changes to take place.

aws rds reboot-db-instance --db-instance-identifier log-stream-demo-instance
    "DBParameterGroups": [
        {
            "DBParameterGroupName": "log-exporting-pg",
            "ParameterApplyStatus": "pending-reboot"
        }
    ],
    DBInstanceStatus": "rebooting"

Create an IAM role and policy and attach the role to your RDS for PostgreSQL instance

Create a bucket and appropriate IAM roles for aws_s3 setup, allowing Amazon RDS to access Amazon S3.

  1. Set up a bucket for exporting logs.
    aws s3api create-bucket --bucket log-export-bucket --region us-east-1
        {
            "Location": "/log-export-bucket"
        }
  2. Create an IAM policy with the least-restricted privilege to the resources in the following code and name it postgres-s3-access-policy. The policy must have access to the Amazon S3 bucket where the files are stored (for this post, log-export-bucket).
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Action": [
                    "s3:GetObject",
                    "s3:AbortMultipartUpload",
                    "s3:DeleteObject",
                    "s3:ListMultipartUploadParts",
                    "s3:PutObject",
                    "s3:ListBucket"
                ],
                "Resource": [
                    "arn:aws:s3:::log-export-bucket/*",
                    "arn:aws:s3:::log-export-bucket"
                ]
            }
        ]
    }
    
    aws iam create-policy --policy-name postgres-s3-access-policy --policy-document "$(cat accessPolicyDocument.txt)"
    {
        "Policy": {
            "PolicyName": "postgres-s3-access-policy",
            "PermissionsBoundaryUsageCount": 0,
            "CreateDate": "",
            "AttachmentCount": 0,
            "IsAttachable": true,
            "PolicyId": "",
            "DefaultVersionId": "v1",
            "Path": "/",
            "Arn": "arn:aws:iam::000123456789:policy/postgres-s3-access-policy",
            "UpdateDate": ""
        }
    }
  1. Create an IAM role named postgres-s3-export-role and modify the trust relationship.
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Effect": "Allow",
                "Principal": {
                    "Service": "rds.amazonaws.com"
                },
                "Action": "sts:AssumeRole"
            }
        ]
    }
    
    aws iam create-role --role-name postgres-s3-export-role --assume-role-policy-document "$(cat assumeRolePolicyDocument.txt)" 
    {
        "Role": {
            "AssumeRolePolicyDocument": {
                "Version": "2012-10-17",
                "Statement": [
                    {
                        "Action": "sts:AssumeRole",
                        "Effect": "Allow",
                        "Principal": {
                            "Service": "rds.amazonaws.com"
                        }
                    }
                ]
            },
            "RoleId": "",
            "CreateDate": "",
            "RoleName": "postgres-s3-export-role",
            "Path": "/",
            "Arn": "arn:aws:iam::000123456789:role/postgres-s3-export-role"
        }
    }
    
    aws iam attach-role-policy --role-name postgres-s3-export-role --policy-arn arn:aws:iam::000123456789:policy/postgres-s3-access-policy
        <No output indicates success>

AssumeRole allows Amazon RDS to access other AWS services on your behalf.

  1. Associate the IAM roles to the DB instance and cluster.
    aws rds add-role-to-db-instance --db-instance-identifier log-stream-demo-instance  --feature-name s3Export  --role-arn arn:aws:iam::000123456789:role/postgres-s3-export-role
        <No output indicates success>

Import PostgreSQL logs into the table using extension log_fdw

To use the log_fdw functions, we must first create the extension on the database instance. Connect to the database using psql and run the following command.

postgres=> CREATE EXTENSION log_fdw;
CREATE EXTENSION

With the extension loaded, we can create a function that loads all the available PostgreSQL DB log files as a table within the database. The definition of the function is available on GitHub.

With the function created, we can run the function to load the PostgreSQL logs into the database. Each time we run the following command, the logs.postgres_logs table is updated with the most recent engine logs.

postgres=> SELECT public.load_postgres_log_files();
NOTICE:  CSV log format will be used.
NOTICE:  Processing log file - postgresql.log.2021-11-19-21.csv
NOTICE:  Processing log file - postgresql.log.2021-11-19-20.csv
             load_postgres_log_files
--------------------------------------------------
 Postgres logs loaded to table logs.postgres_logs
(1 row)

We can view the logs through a SELECT command on the logs.postgres_logs table. By narrowing down our SELECT command, we can tailor our query to a specific use case.

In the following example, we’re looking only at error messages from the engine logs that occurred within the last hour.

SELECT * FROM logs.postgres_logs WHERE message SIMILAR TO '%(INSERT|UPDATE|SELECT|DELETE)%' AND log_time >= NOW() - INTERVAL '1' HOUR ORDER BY log_time DESC;

Export PostgreSQL logs from table into Amazon S3 using aws_s3

Now that we have a function to query for new log statements, we use aws_s3 to export the retrieved logs to Amazon S3. From the prerequisites, we should already have an S3 bucket created and we should have attached an IAM role to the DB instance that allows for writing to your S3 bucket.

Create the aws_s3 extension with the following code:

postgres=> CREATE EXTENSION aws_s3 CASCADE;
CREATE EXTENSION

Run the query export to Amazon S3 function:

postgres=> SELECT aws_s3.query_export_to_s3('SELECT * FROM logs.postgres_logs WHERE message SIMILAR TO $Q1$%(INSERT|UPDATE|SELECT|DELETE)%$Q1$ AND log_time >= NOW() - INTERVAL $Q2$1$Q2$ HOUR ORDER BY log_time DESC;', 's3-bucket-name', 'exported-file-name', 's3-bucket-region', options := 'format text');

 rows_uploaded | files_uploaded | bytes_uploaded
---------------+----------------+----------------
             5 |              1 |              235
(1 row)

Note: This query uses dollar quoting to handle nested quotation marks. The dollar quotes consist of a dollar sign ($), a numbered quote Q1, Q2, and another dollar sign, surrounding the string they are quoting. In this example, those quoted strings are “%(INSERT|UPDATE|SELECT|DELETE)%” and “1”.

As of this writing, cross-Region exports are blocked.

Automate the log exports using extension pg_cron

Now that we have the steps to perform log uploads to Amazon S3 using the log_fdw and aws_s3 extensions, we can automate these steps using pg_cron. With pg_cron, we can write database queries to be run on a schedule of our choosing.

As part of the prerequisites, you should have pg_cron added to the shared_preload_libraries parameter in your database instance’s parameter group. After pg_cron is loaded into shared_preload_libraries, you can simply run the following command to create the extension:

postgres=> CREATE EXTENSION pg_cron;
CREATE EXTENSION

With pg_cron created, we can use the extension to perform the PostgreSQL log uploads on a cron defined schedule. To do this, we need to schedule a cron job, passing in a name, schedule, and the log export query we want to run. For example, to schedule log uploads every hour with the same query described earlier, we can run the following command:

postgres=> SELECT cron.schedule('postgres-s3-log-uploads', '0 * * * *', 'do $$ begin
    PERFORM public.load_postgres_log_files();
    PERFORM aws_s3.query_export_to_s3($Q1$SELECT * FROM 
    logs.postgres_logs WHERE message SIMILAR TO 
    $Q2$%(INSERT|UPDATE|SELECT|DELETE)%$Q2$ AND log_time >= NOW()
    - INTERVAL $Q3$1$Q3$ HOUR ORDER BY log_time DESC;$Q1$, 
    $Q4$s3-bucket-name$Q4$, $Q5$exported-file-name$Q5$, 
    $Q6$s3-bucket-region$Q6$, options := $Q7$format text$Q7$);
end $$;');

 schedule
----------
        1
(1 row)

After we schedule the uploads, we can query the cron.job table to see our scheduled cron job along with its info:

postgres=# SELECT * FROM cron.job WHERE jobname = 'postgres-s3-log-uploads';
jobid  | schedule  | command | nodename  | nodeport | database | username | active | jobname        
-------+-----------+---------+-----------+----------+----------+----------+-------+-----------
   1   | 0 * * * * | do $$ begin PERFORM public.load_postgres_log_files(); PERFORM aws_s3.query_export_to_s3($Q1$SELECT * FROM logs.postgres_logs WHERE message SIMILAR TO $Q2$%(INSERT|UPDATE|SELECT|DELETE)%$Q2$ AND log_time >= NOW() - INTERVAL $Q3$1$Q3$ HOUR ORDER BY log_time DESC;$Q1$, $Q4$s3-bucket-name$Q4$, $Q5$exported-file-name$Q5$,Q6$s3-bucket-region$Q6$, options := $Q7$format text$Q7$); end $$; | localhost |  5432  |  postgres | rdsadmin |  t  | postgres-s3-log-uploads 
(1 row)

We can also view the run history from each run of this cron job by querying the pg_cron audit table. This provides information on the run time of each upload, whether it succeeded or failed, and more. See the following code:

postgres=> SELECT * FROM cron.job_run_details WHERE jobid IN (SELECT jobid FROM cron.job WHERE jobname = 'postgres-s3-log-uploads');

jobid | runid | job_pid | database | username | command | status | return_message | start_time | end_time
------+-------+---------+----------+----------+---------+--------+------------------------+------------+----------
(0 rows)

If you decide at any time that you want to cancel these automated log uploads, you can unschedule the associated cron job by passing in the job name specified previously. In the following example, the job name is postgres-s3-log-uploads:

postgres=> SELECT cron.unschedule('postgres-s3-log-uploads'); 
unschedule
------------
 t
(1 row)

For more information about scheduling jobs with pg_cron, see Scheduling jobs with pg_cron on your Amazon RDS for PostgreSQL or Amazon Aurora for PostgreSQL databases.

Common errors and troubleshooting

You may get the following error when exporting the database logs from Amazon RDS to an S3 bucket:

ERROR:  could not upload to Amazon S3
DETAIL:  Amazon S3 client returned 'curlCode: 28, Timeout was reached'.

The message usually occurs when the outbound rules are missing in the security group attached to your RDS instance. When you call the aws_s3.query_export_to_s3 function, Amazon RDS makes the PutObject HTTPS API call to upload the file to S3 bucket. Make sure that the policy attached to the role provides access to the S3 bucket.

You also get an error if you create an extension but the particular Amazon RDS for PostgreSQL version doesn’t support the extension. For example, the extension pg_cron is supported starting Amazon RDS for PostgreSQL version 12.5. If you’re using a version lower than 12.5, an error occurs. For more information about supported extensions, see PostgreSQL extensions supported on Amazon RDS.

Best practices

We recommend setting the parameter log_destination to ‘csvlog‘ in your parameter group. With this, the log_fdw extension creates tables that contains log statements with columns, which makes it easier for the users to query the data.

Exporting PostgreSQL logs to Amazon S3 using extensions is only recommended for low volume exports. Exporting a huge amount of data out from Amazon RDS for PostgreSQL to Amazon S3 is an intensive operation and may impact other transactions running on the cluster. For larger log transferring and monitoring of logs, we recommend using Amazon CloudWatch. CloudWatch provides various metrics, alarms, and tools that simplify the process of monitoring your database. CloudWatch also provides the option of exporting to Amazon S3 if this specific functionality is also desired.

There are various ways for improving upload speed that can vary depending on your workload. For example, if you want to upload error log statements and DDL statements, running parallel exports may improve your upload speed. For more information about best practices when exporting to Amazon S3, refer to Export and import data from Amazon S3 to Amazon Aurora PostgreSQL.

The audit table cron.job_run_details stores the historical runtime information of the scheduled cron jobs. This table can grow in size very quickly. As a best practice, you can schedule a job to clean up the older records from the table periodically.

Summary

In this post, we showed how you can use the PostgreSQL extensions log_fdw, aws_s3, and pg_cron to automate the export of PostgreSQL logs to Amazon S3. This is a great use case for storing historical audit-related data. You can also use various visualization tools on the data stored on Amazon S3 to perform analytics and build dashboards to see the trends.

If you have any questions or suggestions on this post, leave us a comment.


About the Authors

Bilva Sanaba is a Software Development Engineer with AWS RDS PostgreSQL team. He has worked on designing and developing various customer facing features such aws_s3.

Zachary Bessette is a Software Development Engineer with AWS RDS PostgreSQL team. He has worked on designing and developing various customer facing features such as pg_cron.

Sukhpreet Kaur Bedi is a Database Specialist Solutions Architect with AWS focusing on Amazon RDS/Aurora PostgreSQL engines. She helps customers innovate on the AWS platform by building highly available, scalable, and secure database architectures.