AWS Big Data Blog

Stream data to Amazon S3 for real-time analytics using the Oracle GoldenGate S3 handler

Modern business applications rely on timely and accurate data with increasing demand for real-time analytics. There is a growing need for efficient and scalable data storage solutions. Data at times is stored in different datasets and needs to be consolidated before meaningful and complete insights can be drawn from the datasets. This is where replication tools help move the data from its source to the target systems in real time and transform it as necessary to help businesses with consolidation.

In this post, we provide a step-by-step guide for installing and configuring Oracle GoldenGate for streaming data from relational databases to Amazon Simple Storage Service (Amazon S3) for real-time analytics using the Oracle GoldenGate S3 handler.

Oracle GoldenGate for Oracle Database and Big Data adapters

Oracle GoldenGate is a real-time data integration and replication tool used for disaster recovery, data migrations, high availability. It captures and applies transactional changes in real time, minimizing latency and keeping target systems synchronized with source databases. It supports data transformation, allowing modifications during replication, and works with various database systems, including SQL Server, MySQL, and PostgreSQL. GoldenGate supports flexible replication topologies such as unidirectional, bidirectional, and multi-master configurations. Before using GoldenGate, make sure you have reviewed and adhere to the license agreement.

Oracle GoldenGate for Big Data provides adapters that facilitate real-time data integration from different sources to big data services like Hadoop, Apache Kafka, and Amazon S3. You can configure the adapters to control the data capture, transformation, and delivery process based on your specific requirements to support both batch-oriented and real-time streaming data integration patterns.

GoldenGate provides special tools called S3 event handlers to integrate with Amazon S3 for data replication. These handlers allow GoldenGate to read from and write data to S3 buckets. This option allows you to use Amazon S3 for GoldenGate deployments across on-premises, cloud, and hybrid environments.

Solution overview

The following diagram illustrates our solution architecture.

In this post, we walk you through the following high-level steps:

  1. Install GoldenGate software on Amazon Elastic Compute Cloud (Amazon EC2).
  2. Configure GoldenGate for Oracle Database and extract data from the Oracle database to trail files.
  3. Replicate the data to Amazon S3 using the GoldenGate for Big Data S3 handler.

Prerequisites

You must have the following prerequisites in place:

Install GoldenGate software on Amazon EC2

You need to run GoldenGate on EC2 instances. The instances must have adequate CPU, memory, and storage to handle the anticipated replication volume. For more details, refer to Operating System Requirements. After you determine the CPU and memory requirements, select a current generation EC2 instance type for GoldenGate.

Use the following formula to estimate the required trail space:

trail disk space = transaction log volume in 1 hour x number of hours down x .4

When the EC2 instance is up and running, download the following GoldenGate software from the Oracle GoldenGate Downloads page:

  • GoldenGate 21.3.0.0
  • GoldenGate for Big Data 21c

Use the following steps to upload and install the file from your local machine to the EC2 instance. Make sure that your IP address is allowed in the inbound rules of the security group of your EC2 instance before starting a session. For this use case, we install GoldenGate for Classic Architecture and Big Data. See the following code:

scp -i pem-key.pem 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip ec2-user@hostname:~/.
ssh -i pem-key.pem  ec2-user@hostname
unzip 213000_fbo_ggs_Linux_×64_Oracle_shiphome.zip
Bash

Install GoldenGate 21.3.0.0

Complete the following steps to install GoldenGate 21.3 on an EC2 instance:

  1. Create a home directory to install the GoldenGate software and run the installer:
    mkdir /u01/app/oracle/product/OGG_DB_ORACLE
    /fbo_ggs_Linux_x64_Oracle_shiphome/Disk1
    
    ls -lrt
    total 8
    drwxr-xr-x. 4 oracle oinstall 187 Jul 29 2021 install
    drwxr-xr-x. 12 oracle oinstall 4096 Jul 29 2021 stage
    -rwxr-xr-x. 1 oracle oinstall 918 Jul 29 2021 runInstaller
    drwxrwxr-x. 2 oracle oinstall 25 Jul 29 2021 response
    Bash
  2. Run runInstaller:
    [oracle@hostname Disk1]$ ./runInstaller
    Starting Oracle Universal Installer.
    Checking Temp space: must be greater than 120 MB.   Actual 193260 MB Passed
    Checking swap space: must be greater than 150 B.       Actual 15624 MB    Passed
    Bash

A GUI window will pop up to install the software.

  1. Follow the instructions in the GUI to complete the installation process. Provide the directory path you created as the home directory for GoldenGate.

After the GoldenGate software installation is complete, you can create the GoldenGate processes that read the data from the source. First, you configure OGG EXTRACT.

  1. Create an extract parameter file for the source Oracle database. The following code is the sample file content:
    [oracle@hostname Disk1]$vi eabc.prm
    
    -- Extract group name
    EXTRACT EABC
    SETENV (TNS_ADMIN = "/u01/app/oracle/product/19.3.0/network/admin")
    
    -- Extract database user login
    
    USERID ggs_admin@mydb, PASSWORD "********"
    
    -- Local trail on the remote host
    EXTTRAIL /u01/app/oracle/product/OGG_DB_ORACLE/dirdat/ea
    IGNOREREPLICATES
    GETAPPLOPS
    TRANLOGOPTIONS EXCLUDEUSER ggs_admin
    TABLE scott.emp;
    SQL
  2. Add the EXTRACT on the GoldenGate prompt by running the following command:
    GGSCI> ADD EXTRACT EABC, TRANLOG, BEGIN NOW
    SQL
  3. After you add the EXTRACT, check the status of the running programs with the info all

You will see the EXTRACT status is in the STOPPED state, as shown in the following screenshot; this is expected.

  1. Start the EXTRACT process as shown in the following figure.

The status changes to RUNNING. The following are the different statuses:

  • STARTING – The process is starting.
  • RUNNING – The process has started and is running normally.
  • STOPPED – The process has stopped either normally (controlled manner) or due to an error.
  • ABENDED – The process has been stopped in an uncontrolled manner. An abnormal end is known as ABEND.

This will start the extract process and a trail file will be created in the location mentioned in the extract parameter file.

  1. You can verify this by using the command stats <<group_name>>, as shown in the following screenshot.

Install GoldenGate for Big Data 21c

In this step, we install GoldenGate for Big Data in the same EC2 instance where we installed the GoldenGate Classic Architecture.

  1. Create a directory to install the GoldenGate for Big Data software. To copy the .zip file, follow these steps:
    mkdir /u01/app/oracle/product/OGG_BIG_DATA
    
    unzip 214000_ggs_Linux_x64_BigData_64bit.zip
    tar -xvf ggs_Linux_x64_BigData_64bit.tar
    
    GGSCI> CREATE SUBDIRS
    GGSCI> EDIT PARAM MGR
    PORT 7801
    
    GGSCI> START MGR
    Bash

This will start the MANAGER program. Now you can install the dependencies required for the REPLICAT to run.

  1. Go to /u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader and run the sh file with the latest version of aws-java-sdk. This script downloads the AWS SDK, which provides client libraries for connectivity to the AWS Cloud.
    [oracle@hostname DependencyDownloader]$ ./aws.sh 1.12.748
    Bash

Configure the S3 handler

To configure an GoldenGate Replicat to send data to an S3 bucket, you need to set up a Replicat parameter file and properties file that defines how data is handled and sent to Amazon S3.

AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY are the access key and secret access key of your IAM user, respectively. Do not hardcode credentials or security keys in the parameter and properties file. There are several methods available to achieve this, such as the following:

#!/bin/bash

# Use environment variables that are already set in the OS
export AWS_ACCESS_KEY_ID=$AWS_ACCESS_KEY_ID
export AWS_SECRET_ACCESS_KEY=$AWS_SECRET_ACCESS_KEY
export AWS_REGION="your_aws_region"
Bash

You can set these environment variables in your shell configuration file (e.g., .bashrc, .bash_profile, .zshrc) or use a secure method to set them temporarily:

export AWS_ACCESS_KEY_ID="your_access_key_id"
export AWS_SECRET_ACCESS_KEY="your_secret_access_key"
Bash

Configure the properties file

Create a properties file for the S3 handler. This file defines how GoldenGate will interact with your S3 bucket. Make sure that you have added the correct parameters as shown in the properties file.

The following code is an example of an S3 handler properties file (dirprm/reps3.properties):

[oracle@hostname dirprm]$ cat reps3.properties
gg.handlerlist=filewriter

gg.handler.filewriter.type=filewriter
gg.handler.filewriter.fileRollInterval=60s
gg.handler.filewriter.fileNameMappingTemplate=${tableName}${currentTimestamp}.json
gg.handler.filewriter.pathMappingTemplate=./dirout
gg.handler.filewriter.stateFileDirectory=./dirsta
gg.handler.filewriter.format=json
gg.handler.filewriter.finalizeAction=rename
gg.handler.filewriter.fileRenameMappingTemplate=${tableName}${currentTimestamp}.json
gg.handler.filewriter.eventHandler=s3

goldengate.userexit.writers=javawriter
#TODO Set S3 Event Handler- please update as needed
gg.eventhandler.s3.type=s3
gg.eventhandler.s3.region=eu-west-1
gg.eventhandler.s3.bucketMappingTemplate=s3bucketname
gg.eventhandler.s3.pathMappingTemplate=${tableName}_${currentTimestamp}
gg.eventhandler.s3.accessKeyId=$AWS_ACCESS_KEY_ID
gg.eventhandler.s3.secretKey=$AWS_SECRET_ACCESS_KEY

gg.classpath=/u01/app/oracle/product/OGG_BIG_DATA/dirprm/:/u01/app/oracle/product/OGG_BIG_DATA/DependencyDownloader/dependencies/aws_sdk_1.12.748/
gg.log=log4j
gg.log.level=DEBUG

#javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar -Daws.accessKeyId=my_access_key_id -Daws.secretKey=my_secret_key
javawriter.bootoptions=-Xmx512m -Xms32m -Djava.class.path=.:ggjava/ggjava.jar
Bash

Configure GoldenGate REPLICAT

Create the parameter file in /dirprm in the GoldenGate for Big Data home:

[oracle@hostname dirprm]$ vi rps3.prm
REPLICAT rps3
-- Command to add REPLICAT
-- add replicat fw, exttrail AdapterExamples/trail/tr
SETENV(GGS_JAVAUSEREXIT_CONF = 'dirprm/rps3.props')
TARGETDB LIBFILE libggjava.so SET property=dirprm/rps3.props
REPORTCOUNT EVERY 1 MINUTES, RATE
MAP SCOTT.EMP, TARGET gg.handler.s3handler;;

[oracle@hostname OGG_BIG_DATA]$ ./ggsci
GGSCI > add replicat rps3, exttrail ./dirdat/tr/ea
Replicat added.

GGSCI > info all
Program Status Group Lag at Chkpt Time Since Chkpt
MANAGER RUNNING
REPLICAT STOPPED RPS3 00:00:00 00:00:39

GGSCI > start *
Sending START request to Manager ...
Replicat group RPS3 starting.
Bash

Now you have successfully started the Replicat. You can verify this by running info and stats commands followed by the Replicat name, as shown in the following screenshot.

To confirm that the file has been replicated to an S3 bucket, open the Amazon S3 console and open the bucket you created. You can see that the table data has been replicated to Amazon S3 in JSON file format.

Best practices

Make sure that you are following the best practices on performance, compression, and security.

Consider the following best practices for performance:

The following are best practices for compression:

  • Enable compression for trail files to reduce storage requirements and improve network transfer performance.
  • Use GoldenGate’s built-in compression capabilities or use file system-level compression tools.
  • Strike a balance between compression level and CPU overhead, because higher compression levels may impact performance.

Lastly, when implementing Oracle GoldenGate for streaming data to Amazon S3 for real-time analytics, it’s crucial to address various security considerations to protect your data and infrastructure. Follow the security best practices for Amazon S3 and security options available for GoldenGate Classic Architecture.

Clean up

To avoid ongoing charges, delete the resources that you created as part of this post:

  1. Remove the S3 bucket and trail files if no longer needed and stop the GoldenGate processes on Amazon EC2.
  2. Revert the changes that you made in the database (such as grants, supplemental logging, and archive log retention).
  3. To delete the entire setup, stop your EC2 instance.

Conclusion

In this post, we provided a step-by-step guide for installing and configuring GoldenGate for Oracle Classic Architecture and Big Data for streaming data from relational databases to Amazon S3. With these instructions, you can successfully set up an environment and take advantage of the real-time analytics using a GoldenGate handler for Amazon S3, which we will explore further in an upcoming post.

If you have any comments or questions, leave them in the comments section.


About the Authors

Prasad Matkar is Database Specialist Solutions Architect at AWS based in the EMEA region. With a focus on relational database engines, he provides technical assistance to customers migrating and modernizing their database workloads to AWS.

Arun Sankaranarayanan is a Database Specialist Solution Architect based in London, UK. With a focus on purpose-built database engines, he assists customers in migrating and modernizing their database workloads to AWS.

Giorgio Bonzi is a Sr. Database Specialist Solutions Architect at AWS based in the EMEA region. With a focus on relational database engines, he provides technical assistance to customers migrating and modernizing their database workloads to AWS.