AWS Big Data Blog
Accelerating Amazon Redshift federated query to Amazon Aurora MySQL with AWS CloudFormation
Amazon Redshift federated query allows you to combine data from one or more Amazon Relational Database Service (Amazon RDS) for MySQL and Amazon Aurora MySQL databases with data already in Amazon Redshift. You can also combine such data with data in an Amazon Simple Storage Service (Amazon S3) data lake.
This post shows you how to set up Aurora MySQL and Amazon Redshift with a TPC-DS dataset so you can take advantage of Amazon Redshift federated query using AWS CloudFormation. You can use the environment you set up in this post to experiment with various use cases in the post Announcing Amazon Redshift federated querying to Amazon Aurora MySQL and Amazon RDS for MySQL.
Benefits of using CloudFormation templates
The standard workflow for setting up Amazon Redshift federated query involves six steps. For more information, see Querying data with federated queries in Amazon Redshift. With a CloudFormation template, you can condense these manual procedures into a few steps listed in a text file. The declarative code in the file captures the intended state of the resources that you want to create and allows you to automate the setup of AWS resources to support Amazon Redshift federated query. You can further enhance this template to become the single source of truth for your infrastructure.
A CloudFormation template acts as an accelerator. It helps you automate the deployment of technology and infrastructure in a safe and repeatable manner across multiple Regions and accounts with the least amount of effort and time.
Architecture overview
The following diagram illustrates the solution architecture.
The CloudFormation template provisions the following components in the architecture:
- VPC
- Subnets
- Route tables
- Internet gateway
- Amazon Linux bastion host
- Secrets
- Aurora for MySQL cluster with TPC-DS dataset preloaded
- Amazon Redshift cluster with TPC-DS dataset preloaded
- Amazon Redshift IAM role with required permissions
Prerequisites
Before you create your resources in AWS CloudFormation, you must complete the following prerequisites:
- Have an AWS Identity and Access Management (IAM) user with sufficient permissions to interact with the AWS Management Console and related AWS services. Your IAM permissions must also include access to create IAM roles and policies via the CloudFormation template.
- Create an Amazon Elastic Compute Cloud (Amazon EC2) key pair in the us-east-1 Region. For instructions, see Create a key pair using Amazon EC2. Make sure that you save the private key; this is the only time you can do so. You use this key pair as an input parameter when you set up the CloudFormation stack.
Setting up resources with AWS CloudFormation
This post provides a CloudFormation template as a general guide. You can review and customize it to suit your needs. Some of the resources that this stack deploys incur costs when in use.
To create your resources, complete the following steps:
- Sign in to the console.
- Choose the
us-east-1
Region in which to create the stack. - Choose Launch Stack:
- Choose Next.
This automatically launches AWS CloudFormation in your AWS account with a template. It prompts you to sign in as needed. You can view the CloudFormation template from within the console.
- For Stack name, enter a stack name.
- For Session, leave as the default.
- For ec2KeyPair, choose the key pair you created earlier.
- Choose Next.
- On the next screen, choose Next.
- Review the details on the final screen and select I acknowledge that AWS CloudFormation might create IAM resources.
- Choose Create.
Stack creation can take up to 45 minutes.
- After the stack creation is complete, on the Outputs tab of the stack, record the value of the key for the following components, which you use in a later step:
AuroraClusterEndpoint
AuroraSecretArn
RedshiftClusterEndpoint
RedshiftClusterRoleArn
You’re now ready to log in to both the Aurora MySQL and Amazon Redshift cluster and run some basic commands to test them.
Logging in to the clusters using the Amazon Linux bastion host
The following steps assume that you use a computer with an SSH client to connect to the bastion host. For more information about connecting using various clients, see Connect to your Linux instance.
- Move the private key of the EC2 key pair (that you saved previously) to a location on your SSH client, where you are connecting to the Amazon Linux bastion host.
- Change the permission of the private key using the following code, so that it’s not publicly viewable:
- On the Amazon EC2 console, choose Instances.
- Choose the Amazon Linux bastion host that the CloudFormation stack created.
- Choose Connect.
- Copy the value for
SSHCommand
. - On the SSH client, change the directory to the location where you saved the EC2 private key, and enter the
SSHCommand
value. - On the console, open the AWS Secrets Manager dashboard.
- Choose the secret secretAuroraMasterUser-*.
- Choose Retrieve secret value.
- Record the password under Secret key/value, which you use to log in to the Aurora MySQL cluster.
- Choose the secret SecretRedshiftMasterUser.
- Choose Retrieve secret value.
- Record the password under Secret key/value, which you use to log in to the Amazon Redshift cluster.
- Log in to both Aurora MySQL using the MySQL Command-Line Client and Amazon Redshift using query editor.
The CloudFormation template has already set up MySQL Command-Line Client binaries on the Amazon Linux bastion host.
- On the Amazon Redshift console, choose Editor.
- Choose Query editor.
- For Connection, choose Create new connection.
- For Cluster, choose the Amazon Redshift cluster.
- For Database name, enter your database.
- Enter the database user and password recorded earlier.
- Choose Connect to database.
- Enter the following SQL command:
You should see 25 tables as the output.
- Launch a command prompt session of the bastion host and enter the following code (substitute <AuroraClusterEndpoint> with the value from the AWS CloudFormation output):
- Enter the following SQL command:
You should see the following eight tables as the output:
Completing federated query setup
The final step is to create an external schema to connect to the Aurora MySQL instance. The following example code creates an external schema statement that you need to run on your Amazon Redshift cluster to complete this step:
Use the following parameters:
- URI – The
AuroraClusterEndpoint
value from the CloudFormation stack outputs. The value is in the format<stackname>-cluster.<randomcharacter>.us-east-1.rds.amazonaws.com
. - IAM_Role – The
RedshiftClusterRoleArn
value from the CloudFormation stack outputs. The value is in the formatarn:aws:iam::<accountnumber>:role/<stackname>-RedshiftClusterRole-<randomcharacter>
. - Secret_ARN – The
AuroraSecretArn
value from the CloudFormation stack outputs. The value is in the formatarn:aws:secretsmanager:us-east-1:<accountnumber>: secret:secretAuroraMasterUser-<randomcharacter>
.
Federated query test
Now that you have set up federated query, you can start testing the feature using the TPC-DS dataset that was preloaded into both Aurora MySQL and Amazon Redshift.
For example, the following query aggregates the total net sales by product category and class from the web_sales
fact table and date
and item
dimension tables. Tables web_sales
and date
are stored in Amazon Redshift, and the item
table is stored in Aurora MySQL:
You can continue to experiment with the dataset and explore the three main use cases in the post [exact name of post title with embedded link].
Cleaning up
When you’re finished, delete the CloudFormation stack, because some of the AWS resources in this walkthrough incur a cost if you continue to use them. Complete the following steps:
- On the AWS CloudFormation console, choose Stacks.
- Choose the stack you launched in this walkthrough. The stack must be currently running.
- In the stack details pane, choose Delete.
- Choose Delete stack.
Summary
This post showed you how to automate the creation of an Aurora MySQL and Amazon Redshift cluster preloaded with the TPC-DS dataset, the prerequisites for the new Amazon Redshift federated query feature using AWS CloudFormation, and a single manual step to complete the setup. It also provided an example federated query using the TPC-DS dataset, which you can use to accelerate your learning and adoption of the new feature. You can continue to modify the CloudFormation templates from this post to support your business needs.
If you have any questions or suggestions, please leave a comment.
About the Authors
BP Yau is an Analytics Specialist Solutions Architect at AWS. His role is to help customers architect big data solutions to process data at scale. Before AWS, he helped Amazon.com Supply Chain Optimization Technologies migrate its Oracle data warehouse to Amazon Redshift and build its next generation big data analytics platform using AWS technologies.
Srikanth Sopirala is a Sr. Specialist Solutions Architect, Analytics at AWS. He is passionate about helping customers build scalable data and analytics solutions in the cloud.
Zhouyi Yang is a Software Development Engineer for Amazon Redshift Query Processing team. He’s passionate about gaining new knowledge about large databases and has worked on SQL language features such as federated query and IAM role privilege control. In his spare time, he enjoys swimming, tennis, and reading.
Entong Shen is a Senior Software Development Engineer for Amazon Redshift. He has been working on MPP databases for over 8 years and has focused on query optimization, statistics, and SQL language features such as stored procedures and federated query. In his spare time, he enjoys listening to music of all genres and working in his succulent garden.