AWS Security Blog

Federated access to Amazon Athena using AWS IAM Identity Center

Managing Amazon Athena through identity federation allows you to manage authentication and authorization procedures centrally. Athena is a serverless, interactive analytics service that provides a simplified and flexible way to analyze petabytes of data.

In this blog post, we show you how you can use the Athena JDBC driver (which includes a browser Security Assertion Markup Language (SAML) plugin) to connect to Athena from third-party SQL client tools, which helps you quickly implement identity federation capabilities and multi-factor authentication (MFA). This enables automation and enforcement of data access policies across your organization.

You can use AWS IAM Identity Center to federate access to users to AWS accounts. IAM Identity Center integrates with AWS Organizations to manage access to the AWS accounts under your organization. In this post, you will learn how to configure the Athena driver to use the AWS configuration profile credentials. This will allow you to resolve credentials from IAM Identity Center and use the MFA capability of your federation identity provider (IdP).In this post, you will learn how you can integrate the Athena browser-based SAML plugin to add single sign-on (SSO) and MFA capability with your federation identity provider (IdP).

Prerequisites

To implement this solution, you must have the follow prerequisites:

Note: Lake Formation only supports a single role in the SAML assertion. Multiple roles cannot be used.

Solution overview

Figure 1: Solution architecture

Figure 1: Solution architecture

To implement the solution, complete the steps below as shown in Figure 1:

  1. An IAM Identity Center delegated administrator creates two custom permission sets within Identity Center.
  2. An IAM Identity Center delegated administrator assign permission sets to AWS accounts and users and groups. The user has permissions to single sign-on roles that are provisioned in the data lake account. The role created by Identity Center has a name that begins with AWSReservedSSO.
  3. A Lake Formation administrator grants single sign-on roles permissions to the corresponding database and tables.

The solution workflow consists of the following high-level steps as shown in Figure 1:

  1. The user configures IAM Identity Center authentication using the AWS CLI.
  2. The AWS CLI redirects the user to the AWS access portal URL. The user enters workforce identity credentials (username and password). Then chooses Sign in.
  3. The AWS access portal verifies the user’s identity. IAM Identity Center redirects the request to the Identity Center authentication service to validate the user’s credentials.
  4. If MFA is enabled for the user, then they are prompted to authenticate their MFA device.
  5. The user enters or approves the MFA details. The user’s MFA is successfully completed.
  6. The user selects the AWS account to use from the displayed list. Then select the IAM single sign-on role to use from the displayed list.
  7. The user tests the SQL client connection and then uses the client to run a SQL query.
  8. The client makes a call to Athena to retrieve the table and associated metadata from the Data Catalog.
  9. Athena requests access to the data from Lake Formation. Lake Formation invokes the AWS Security Token Service (AWS STS).
  10. Lake Formation invokes AWS STS.
    1. Lake Formation obtains temporary AWS credentials with the permissions of the defined IAM role (sensitive or non-sensitive) associated with the data lake location.
    2. Lake Formation returns temporary credentials to Athena.
  11. Athena uses the temporary credentials to retrieve data objects from Amazon S3.
  12. The Athena engine successfully runs the query and returns the results to the client.

Solution walkthrough

The walkthrough includes five sections that will guide you through the process of creating permission sets, assigning permission sets to AWS Accounts, managing permission sets access using Lake Formation, and setting up third-party SQL clients such as SQL Workbench to connect to your data store and query your data through Athena.

Step 1: Federate onboarding

Federating onboarding is done within the IAM Identity Center account. As part of federated onboarding, you need to create IAM Identity Center users and groups. Groups are a collection of people who have the same security rights and permissions. You can create groups and add users to the groups. Create one IAM Identity Center group for sensitive data and another for non-sensitive data to provide distinct access to different classes of data sets. You can assign access to IAM Identity Center permission sets to a user or group.

To federate onboarding:

  1. Open the AWS Management Console using the IAM Identity Center account and go to IAM Identity Center.
  2. Choose Groups.
  3. Choose Create group.
  4. Enter a Group name and Description .
  5. Choose Create group.

To add a user as a member of a group:

  1. Open the IAM Identity Center console.
  2. Choose Groups.
  3. Select the group name that you want to update.
  4. On the group details page, under Users in this group, choose Add users to group.
  5. On the Add users to group page, under Other users, locate the users you want to add as members and select the check box next to each of them.
  6. Choose Add users to group.

Figure 2: Assigning users to a group

Figure 2: Assigning users to a group

Step 2: Create permission sets

For this step, create two permission sets (sensitive-iam-role and non-sensitive-iam-role). These permission sets can be assigned to users or groups in IAM Identity Center, granting them specific access to AWS account resources.

To create custom permission sets:

  1. In the IAM Identity Center administrator account, under Multi-Account permissions, choose Permission sets.
  2. Choose Create permission set.
  3. On the Select permission set type page, under Permission set type, choose Custom permission set.

    Figure 3: Selecting a permission set

    Figure 3: Selecting a permission set

  4. Choose Next.
  5. On the Specify policies and permission boundary page, expand Inline policy to add custom JSON-formatted policy text.
  6. Insert the following policy and update the S3 bucket name (<s3-bucket-name>), AWS Region (<region>) account ID (<account-id>), CloudWatch alarm name (<AlarmName>), Athena workgroup name (sensitive or non-sensitive) (<WorkGroupName>), KMS key alias name (<KMS-key-alias-name>), and organization ID (<aws-PrincipalOrgID>).
    {
      "Statement": [
        {
          "Action": [
            "lakeformation:SearchTablesByLFTags",
            "lakeformation:SearchDatabasesByLFTags",
            "lakeformation:ListLFTags",
            "lakeformation:GetResourceLFTags",
            "lakeformation:GetLFTag",
            "lakeformation:GetDataAccess",
            "glue:SearchTables",
            "glue:GetTables",
            "glue:GetTable",
            "glue:GetPartitions",
            "glue:GetDatabases",
            "glue:GetDatabase"
          ],
          "Effect": "Allow",
          "Resource": "*",
          "Sid": "LakeformationAccess"
        },
        {
          "Action": [
            "s3:PutObject",
            "s3:ListMultipartUploadParts",
            "s3:ListBucketMultipartUploads",
            "s3:ListBucket",
            "s3:GetObject",
            "s3:GetBucketLocation",
            "s3:CreateBucket",
            "s3:AbortMultipartUpload"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:s3:::<s3-bucket-name>/*",
            "arn:aws:s3:::<s3-bucket-name>"
          ],
          "Sid": "S3Access"
        },
        {
          "Action": "s3:ListAllMyBuckets",
          "Effect": "Allow",
          "Resource": "*",
          "Sid": "AthenaS3ListAllBucket"
        },
        {
          "Action": [
            "cloudwatch:PutMetricAlarm",
            "cloudwatch:DescribeAlarms"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:cloudwatch:<region>:<account-id>:alarm:<AlarmName>"
          ],
          "Sid": "CloudWatchLogs"
        },
        {
          "Action": [
            "athena:UpdatePreparedStatement",
            "athena:StopQueryExecution",
            "athena:StartQueryExecution",
            "athena:ListWorkGroups",
            "athena:ListTableMetadata",
            "athena:ListQueryExecutions",
            "athena:ListPreparedStatements",
            "athena:ListNamedQueries",
            "athena:ListEngineVersions",
            "athena:ListDatabases",
            "athena:ListDataCatalogs",
            "athena:GetWorkGroup",
            "athena:GetTableMetadata",
            "athena:GetQueryResultsStream",
            "athena:GetQueryResults",
            "athena:GetQueryExecution",
            "athena:GetPreparedStatement",
            "athena:GetNamedQuery",
            "athena:GetDatabase",
            "athena:GetDataCatalog",
            "athena:DeletePreparedStatement",
            "athena:DeleteNamedQuery",
            "athena:CreatePreparedStatement",
            "athena:CreateNamedQuery",
            "athena:BatchGetQueryExecution",
            "athena:BatchGetNamedQuery"
          ],
          "Effect": "Allow",
          "Resource": [
            "arn:aws:athena:<region>:<account-id>:workgroup/<WorkGroupName>",
            "arn:aws:athena:{Region}:{Account}:datacatalog/{DataCatalogName}"
          ],
          "Sid": "AthenaAllow"
        },
        {
          "Action": [
            "kms:GenerateDataKey",
            "kms:DescribeKey",
            "kms:Decrypt"
          ],
          "Condition": {
            "ForAnyValue:StringLike": {
              "kms:ResourceAliases": "<KMS-key-alias-name>"
            }
          },
          "Effect": "Allow",
          "Resource": "*",
          "Sid": "kms"
        },
        {
          "Action": "*",
          "Condition": {
            "StringNotEquals": {
              "aws:PrincipalOrgID": "<aws-PrincipalOrgID>"
            }
          },
          "Effect": "Deny",
          "Resource": "*",
          "Sid": "denyRule"
        }
      ],
      "Version": "2012-10-17"
    }
  7. Update the custom policy to add the corresponding Athena workgroup ARN for the sensitive and non-sensitive IAM roles.

    Note: See the documentation for information about AWS global condition context keys.

  8. Choose Next.
  9. On the Specify permission set details page, enter a name to identify this permission set in IAM Identity Center. The name that you specify for this permission set appears in the AWS access portal as an available role. Users sign in to the AWS access portal, choose an AWS account, and then choose the role.
  10. Choose Next.
  11. On the Review and create page, review the selections that you made, and then choose Create.

Step 3: Assign permission sets to AWS accounts

You can add and remove permissions sets for an IAM user or group by attaching and detaching permission sets. Permission sets define what actions an identity can perform on which AWS resources.

To assign permission sets to AWS accounts:

  1. In the IAM Identity Center administrator account, under Multi-account permissions, choose AWS accounts.
  2. On the AWS accounts page, select one or more AWS accounts that you want to assign single sign-on access to.
  3. Choose Assign users or groups.

    Figure 4: Selecting users and groups

    Figure 4: Selecting users and groups

  4. On the Assign users and groups to “<AWS account name>”, for Selected users and groups, choose the users that you want to create the permission set for. Choose Next.
  5. Select permission sets: On the Assign permission sets to “AWS-account-name” page, select one or more permission sets.
  6. On the Review and submit assignments to AWS-account-name page, for Review and submit, choose Submit.

Step 4. Grant permissions to IAM (single sign-on) roles

A data lake administrator has the broad ability to grant a principal (including themselves) permissions on Data Catalog resources. This includes the ability to manage access controls and permissions for the data lake. When you grant Lake Formation permissions on a specific Data Catalog table, you can also include data filtering specifications. This allows you to further restrict access to certain data within the table, limiting what users can see in their query results based on those filtering rules.

To grant permissions to IAM roles:

In the Lake Formation console, under Permissions in the navigation pane, select Data Lake permissions, and then choose Grant.

To grant Database permissions to IAM roles:

  1. Under Principals, select the IAM role name (for example, Sensitive-IAM-Role).
  2. Under Named Data Catalog resources, go to Databases and select a database (for example, demo).

    Figure 5: Select an IAM role and database

    Figure 5: Select an IAM role and database

  3. Under Database permissions, select Describe and then choose Grant.

    Figure 6: Grant database permissions to an IAM role

    Figure 6: Grant database permissions to an IAM role

To grant tables permissions to IAM roles:

  1. Repeat steps 1 and 2 of the preceding procedure.
  2. Under Tables – optional, select a table name (for example, demo2).

    Figure 7: Select tables within a database to grant access

    Figure 7: Select tables within a database to grant access

  3. Select the desired Table Permissions (for example, select and describe), and then choose Grant.

    Figure 8: Grant access to tables within the database

    Figure 8: Grant access to tables within the database

  4. Repeat steps 1 through 4 to grant access for the respective database and tables for the non-sensitive IAM role.

Step 5: Client-side setup using JDBC

You can use a JDBC connection to connect Athena and SQL client applications (for example, PyCharm or SQL Workbench) to enable analytics and reporting on the data that Athena returns from Amazon S3 databases. To use the Athena JDBC driver, you must specify the driver class from the JAR file. Additionally, you must pass in some parameters to change the authentication mechanism so the athena-sts-auth libraries are used:

  • S3 output location – Where in S3 the Athena service can write its output. For example, s3://path/to/query/bucket/.
  • The IAM Identity Center administrator can configure the session duration for the AWS access portal. The session duration can be set from a minimum of 15 minutes to a maximum of 90 days.

To set up PyCharm

  1. Install Athena JDBC 3.x driver from Athena JDBC 3.x driver.
    1. In the left navigation pane, select JDBC 3.x and then Getting started. Select Uber jar to download a .jar file, which contains the driver and its dependencies.

      Figure 9: Download Athena JDBC jar

      Figure 9: Download Athena JDBC jar

  2. Open PyCharm and create a new project.
    1. Enter a Name for your project
    2. Select the desired project Location
    3. Choose Create

    Figure 10: Create a new project in PyCharm

    Figure 10: Create a new project in PyCharm

  3. Configure Data Source and drivers. Select Data Source, and then choose the plus sign or New to configure new data sources and drivers.

    Figure 11: Add database source properties

    Figure 11: Add database source properties

  4. Configure the Athena driver by selecting the Drivers tab, and then choose the plus sign to add a new driver.

    Figure 12: Add database drivers

    Figure 12: Add database drivers

  5. Under Driver Files, upload the custom JAR file that you downloaded in the Step 1. Select the Athena class dropdown. Enter the driver’s name (for example Athena JDBC Driver). Then choose Apply.

    Figure 13: Add database driver files

    Figure 13: Add database driver files

  6. Configure a new data source. Choose the plus sign and select your driver’s name from the driver dropdown.
  7. Enter the data source name (for example, Athena Demo). For the authentication method, select User & Password. Then choose Apply.

    Figure 14: Create a project data source profile

    Figure 14: Create a project data source profile

  8. Select the SSH/SSL tab and select Use SSL. Verify that the Use truststore options for IDE, JAVA, and system are all selected. Then choose Apply.

    Figure 15: Enable data source profile SSL

    Figure 15: Enable data source profile SSL

  9. Select the Options tab and then select Single Session Mode. Then choose Apply.

    Figure 16: Configure single session mode in PyCharm

    Figure 16: Configure single session mode in PyCharm

  10. Select the General tab and enter the JDBC and single sign-on URL. The following is a sample JDBC URL based on the SAML application:
    jdbc:athena://;CredentialsProvider= ProfileCredentials; ProfileName=<name-of-the-profile>;WorkGroup=<name-of-the-WorkGroup>; 
    1. Choose Apply.
    2. Choose Test Connection. If the profile has expired, refresh the single sign-on session by running aws sso login --profile <profile-name> with the corresponding profile.

    Figure 17: Test the data source connection

    Figure 17: Test the data source connection

  11. After the connection is successful, select the Schemas tab and select All databases and All schemas.

    Figure 18: Select data source databases and schemas

    Figure 18: Select data source databases and schemas

  12. Run a sample test query: SELECT <table-names> FROM <database-name> limit 10;
  13. Verify that the credentials and permissions are working as expected.

To set up SQL Workbench

  1. Open SQL Workbench.
  2. Configure an Athena driver by selecting File and then Manage Drivers.
  3. Enter the Athena JDBC Driver as the name and set the library to browse the path for the location where you downloaded the driver. Enter amazonaws.athena.jdbc.AthenaDriver as the Classname.
  4. Enter the following URL, replacing <name-of-the-WorkGroup> with your workgroup name.
    jdbc:athena://;CredentialsProvider=ProfileCredentials;ProfileName=<name-of-the-profile>;WorkGroup=<name-of-the-WorkGroup>;
  5. Choose OK.
  6. Run a test query, replacing <table-names> and <database-name> with your table and database names:
    SELECT <table-names> FROM <database-name> limit 10;
  7. Verify that the credentials and permissions are working as expected.

Conclusion

In this post, we covered how to use JDBC drivers to connect to Athena from third-party SQL client tools. You were able to set this up without creating IAM users or any type of long-lived credentials that would need to be stored on your developers’ workstations. You learned how to configure IAM Identity Center users and groups, create permission sets, and assign permission sets to AWS Accounts. You also learned how to grant permissions to single sign-on roles using Lake Formation to create distinct access to different classes of data sets and connect to Athena through an SQL client tool (such as PyCharm). This setup can also work with other supported identity sources such as IAM Identity Centerself-managed or on-premises Active Directory, or an external IdP.

 
If you have feedback about this post, submit comments in the Comments section below. If you have questions about this post, contact AWS Support.
 

Ajay Rawat Ajay Rawat
Ajay is a Senior Security Consultant, focusing on AWS Identity and Access Management (IAM), data protection, incident response, and operationalizing AWS security services to increase security effectiveness and reduce risk. Ajay is a technology enthusiast and enjoys working with customers to solve their technical challenges and to improve their security posture in the cloud.
Mihir Borkar Mihir Borkar
Mihir is an AWS Data Architect who excels at simplifying customer challenges with innovative cloud data solutions. Specializing in AWS Lake Formation and AWS Glue, he designs scalable data lakes and analytics platforms, demonstrating expertise in crafting efficient solutions within the AWS Cloud.