AWS for Industries

Financial Services Spotlight – Amazon Managed Service for Apache Flink

In this edition of the Financial Services Industry (FSI) Services Spotlight monthly blog series, we highlight five key considerations for customers who process and analyze streaming data on Amazon Managed Service for Apache Flink: achieving compliance, data protection, isolation of compute environments, audits with APIs, and access control/security. Across each area, we will examine specific guidance, suggested reference architectures, and technical code to help streamline service approval of Amazon Managed Service for Apache Flink.

Apache Flink is an open-source distributed processing framework for real time stream processing. Real-time stream processing is the process of taking action on data as they are generated, rather than processing it in batches. While Apache Flink is a powerful tool, managing it can be operationally challenging due to it complex distributed architecture.

Amazon Managed Service for Apache Flink is AWS’s fully managed service that provides the underlying infrastructure for running the Flink applications. It handles core capabilities like provisioning of Flink clusters, parallel computation, automatic scaling and application backups. Customers get to access the full range of Apach Flink’s industry-leading capabilities that runs across multiple Availability Zones (AZ’s) providing a highly resilient stream processing architecture.

In addition, you can interactively query data stream or launch stream applications with few clicks using Amazon Managed Service for Apache Flink Studio.

Application development is easier with Amazon Managed Service for Apach Flink with support for Flink’s flexible APIs in Java, Scala, Python and SQL. It also integrates with hundreds of data sources and destinations, such as Amazon Managed Streaming for Apache Kafka, Amazon Kinesis, Amazon Kinesis Data Firehose, Amazon Simple Storage Service (Amazon S3), Amazon DynamoDB, JDBC Connectors, and custom connectors.

Financial institutions use Amazon Managed Service for Apache Flink to detect and act on meaningful events as they occur proactively. Capital One, a leading consumer and commercial banking institution, observed that data pipelines become less efficient as the volume of data increases. Batch processes take significant time to complete, resulting in reduced throughput. They partnered with the AWS team to build scalable real-time streaming applications using Amazon Managed Service for Apache Flink to address those challenges.

In capital markets, computations are no longer adequate to be completed as an end-of-day batch process. Organizations use streaming analytics to process real-time market data and the associated reference data. An industry leader in financial services uses Amazon S3 as their data lake, with Apache Iceberg as table format, along with a streaming ingestion pipeline using Amazon Managed Service for Apache Flink to ingest 150M records in under 5 minutes enabling the customer to meet end-to-end SLA of 15-minutes from ingestion to reporting.

Achieving Compliance

Amazon Managed Service for Apache Flink is a managed service, and third-party auditors regularly assess its security and compliance as part of multiple AWS compliance programs. As part of the AWS shared responsibility model, the Amazon Managed Service for Apache Flink Service is in the scope of the following compliance programs. You can obtain corresponding compliance reports under an AWS non-disclosure agreement (NDA) through AWS Artifact.

  • PCI
  • ISO/IEC 27001:2013, 27017:2015, 27018:2019, 27701:2019, 22301:2019, 9001:2015, and CSA STAR CCM v4.0
  • ISMAP
  • FedRAMP Moderate (East/West) and FedRAMP High (GovCloud)
  • DoD CC SRG IL2 (East/West), DoD CC SRG IL2 (GovCloud), DoD CC SRG IL4 (GovCloud), DoD CC SRG IL5 (GovCloud)
  • HIPAA
  • IRAP
  • MTCS
  • C5
  • K-ISMS
  • ENS High
  • OSPAR
  • HITRUST CSF
  • FINMA
  • GSMA
  • PiTuKri
  • CCCS MEDIUM (formerly PBMM)
  • SOC 1,2,3

Data Protection with Amazon Managed Service for Apache Flink

Data encryption in Amazon Managed Service for Apache Flink uses service-managed keys. Customer-managed keys (CMK) are not supported. You still have the option to encrypt data using CMK as described in the following paragraph.

Encryption at rest in Amazon Managed Service for Apache Flink

If you choose to encrypt data on the incoming Kinesis data stream using StartStreamEncryption, you can leverage server-side encryption using an AWS Key Management Service (AWS KMS) customer master key (CMK) you specify. Amazon Managed Service for Apache Flink can also read from other streaming sources, such as Amazon Managed Streaming for Apache Kafka (Amazon MSK) or customer-managed Kafka, and write to any streaming or database destination. Ensure that the source and destination you choose encrypt all data at rest and in transit. Suppose you’re using Amazon MSK as source or destination. In that case, you can start with encryption in Amazon MSK as described on the Data Protection documentation page.

Output data can be encrypted at rest using Amazon Kinesis Data Firehose to store data in an encrypted Amazon S3 bucket and you can specify the encryption key used. When using Amazon Managed Service for Apache Flink, your application’s code, durable application storage, and running application storage are encrypted at rest.

Encryption in Transit in Amazon Managed Service for Apache Flink

Amazon Managed Service for Apache Flink encrypts all data in transit; encryption in transit is enabled for all Amazon Managed Service for Apache Flink applications and cannot be disabled. Amazon Managed Service for Apache Flink encrypts data in transit in the following scenarios:

  • Data in transit from Kinesis Data Streams to Amazon Managed Service for Apache Flink.
  • Data in transit between internal components within Amazon Managed Service for Apache Flink.
  • Data in transit between Amazon Managed Service for Apache Flink and Kinesis Data Firehose.

Isolation of computing environments

Amazon Managed Service for Apache Flink is a managed service with no computing resources within the customer Amazon Virtual Private Cloud (VPC). As a managed service, the network security is protected as per the AWS global network security procedure described in the AWS Architecture Center: Security, Identity, & Compliance.

Amazon Managed Service for Apache Flink provides the underlying infrastructure and provisions capacity as Kinesis Processing Units (KPUs). One KPU represents one vCPU, 4 GiB of memory, and 50 GiB of running application storage. Each Amazon Managed Service for Apache Flink application runs in a single-tenant Apache Flink cluster on the infrastructure hosted in AWS-managed VPC on Amazon Elastic Kubernetes Service (Amazon EKS). The Apache Flink cluster runs with the JobMananger in high availability mode using Zookeeper across multiple availability zones. Multiple Kubernetes pods are used by Amazon EKS for each AWS region across availability zones. The Flink cluster and the nodes are AWS-managed and hosted in an Amazon VPC protected by the AWS global network security procedures described in the AWS Security whitepaper.

Customers can configure their Flink application to connect via private subnets in their VPC to access resources during execution. To integrate their application into the VPC, customers can provide their VPC Configuration, along with the needed permissions to access resources within their VPC as part of the CreateApplication and UpdateApplication requests, which creates Elastic Network Interfaces (ENIs) in the customer account and attach them to the nodes in the Amazon Managed Service for Apache Flink’s service account.

Amazon Managed Service for Apache Flink needs permission to read records from the streaming data sources you specify in your application. Amazon Managed Service for Apache Flink also requires permission to write your application output to selected destinations in your application output configuration. You can grant these permissions by creating AWS Identity and Access Management (IAM) roles and granting Amazon Managed Service for Apache Flink access to assume them.

Automating audits with APIs

Customers need services and capabilities that assess their Amazon Managed Service for Apache Flink resources’ compliance status. Implementing AWS Config rules ensures compliance with specific configurations by monitoring the configuration of resources and providing out-of-the-box rules to alert when resources fall into a non-compliant state. Customers can enable AWS Config in their account using the AWS Config console and the AWS Command Line Interface (AWS CLI). AWS Config allows for both managed rules and custom rules, enabling customers to build complex audits given their specific business needs. For more details, see managed config rules.

Besides managed rules in Config, customers can build custom Config rules using API calls related to Managed Service for Apache Flink recorded by AWS CloudTrail. Managed Service for Apache Flink integrates with AWS CloudTrail to record user, role, and service actions. CloudTrail captures all API calls as events, continuously delivering to an Amazon S3 bucket. Activity in Managed Service for Apache Flink and the actions are logged in CloudTrail events, providing details like requests, IP addresses, users, timestamps, and more. For example, calls to the CreateApplication and UpdateApplication actions generate entries in the CloudTrail log files and contain information about who generated the request. The identity information helps you determine the following:

  • Whether the request was made with root or AWS Identity and Access Management (IAM) user credentials.
  • Whether the request was made with temporary security credentials for a role or federated user.
  • Whether the request was made by another AWS service.

The Amazon Managed Service for Apache Flink API reference contains details regarding the CloudTrail entry types. Here’s an example of the AddApplicationCloudWatchLoggingOption action.

{
            "eventVersion": "1.05",
            "userIdentity": {
                "type": "IAMUser",
                "principalId": "EX_PRINCIPAL_ID",
                "arn": "arn:aws:iam::012345678910:user/Alice",
                "accountId": "012345678910",
                "accessKeyId": "EXAMPLE_KEY_ID",
                "userName": "Alice"
            },
            "eventTime": "2019-03-07T01:19:47Z",
            "eventSource": "managed-flink.amazonaws.com",
            "eventName": "AddApplicationCloudWatchLoggingOption",
            "awsRegion": "us-east-1",
            "sourceIPAddress": "127.0.0.1",
            "userAgent": "aws-sdk-java/unknown-version Linux/x.xx",
            "requestParameters": {
                "applicationName": "cloudtrail-test",
                "currentApplicationVersionId": 1,
                "cloudWatchLoggingOption": {
                    "logStreamARN": "arn:aws:logs:us-east-1:012345678910:log-group:cloudtrail-test:log-stream:flink-cloudwatch"
                }
            },
            "responseElements": {
                "cloudWatchLoggingOptionDescriptions": [
                    {
                        "cloudWatchLoggingOptionId": "2.1",
                        "logStreamARN": "arn:aws:logs:us-east-1:012345678910:log-group:cloudtrail-test:log-stream:flink-cloudwatch"
                    }
                ],
                "applicationVersionId": 2,
                "applicationARN": "arn:aws:managed-flink:us-east-1:012345678910:application/cloudtrail-test"
            },
            "requestID": "18dfb315-4077-11e9-afd3-67f7af21e34f",
            "eventID": "d3c9e467-db1d-4cab-a628-c21258385124",
            "eventType": "AwsApiCall",
            "apiVersion": "2018-05-23",
            "recipientAccountId": "012345678910"
        }

The following request code for the AddApplicationCloudWatchLoggingOption action from the above log entry adds an Amazon CloudWatch logging option to a Managed Service for the Apache Flink application:

{
    "applicationName": "cloudtrail-test",
    "currentApplicationVersionId": 1,
    "cloudWatchLoggingOption": {
         "logStreamARN": "arn:aws:logs:us-east-1:012345678910:log-group:cloudtrail-test:log-stream:flink-cloudwatch"
    }
}

To use JSON as the input for an action with the AWS Command Line Interface (AWS CLI), save the request in a JSON file. Then, pass the file name into the action using the –cli-input-json parameter.

$ aws kinesisanalyticsv2 add-application-cloud-watch-logging-option --cli-input-json file://addcwlog.json

Additionally, FSI customers can use AWS Audit Manager to continuously audit their AWS usage and simplify how they assess risk and compliance with regulations and industry standards. AWS Audit Manager automates evidence collection and organizes the evidence as defined by the control set in the framework selected, such as PCI-DSS, SOC 2, and GDPR. Audit Manager collects data from sources, including AWS CloudTrail, to compare the environment’s configurations against the compliance controls. By logging all Managed Service for Apache Flink calls in CloudTrail, Audit Manager’s integration with CloudTrail becomes advantageous when ensuring that controls have been met. Consider the encryption requirement in SOC 2, for example. Rather than querying across all CloudTrail logs to ensure the S3 bucket for Managed Service for Apache Flink’s output is encrypted, customers can centrally see whether the requirement is being met in Audit Manager. Audit Manager saves time with automated collection of evidence and provides audit-ready reports for customers to review. The Audit Manager assessment report uses cryptographic verification to help you ensure the integrity of the assessment report. The following screenshot illustrates the configuration of a custom control for a data source for the Managed Service for Apache Flink’ action of interest.

Configuration of a custom control for a data source for the Managed Service for Apache Flink’ action of interest.

Operational Access and Security

Amazon Managed Service for Apache Flink requires permission to read records from specified streaming data sources and write outputs to designated destinations to operate effectively. These permissions are granted through IAM roles assumed by Managed Service for Apache Flink, and the scope of its capabilities depends on the permissions assigned to the role.

By default, Managed Service for Apache Flink applications and Studio notebooks configured to access resources within a specific VPC do not have internet access. Internet connectivity is only available if the underlying VPC configuration allows for it.

The proceeding table contains the supported IAM features for Managed Service for Apache Flink. For a comprehensive understanding of how IAM features are supported across various AWS services, see AWS services that work with IAM.

Actions Yes
Resource-level Permissions Yes
Resource-based Policies No
ABAC Yes
Temporary Credentials Yes
Service-linked Roles No

Table 1: IAM support for Amazon Managed Service for Apache Flink

Identity-based policies are JSON permissions policy documents that you can attach to an identity, such as an IAM user, group of users, or role. These policies control what actions users and roles can perform, on which resources, and under what conditions.

By default, users and roles don’t have permission to create or modify Amazon Managed Service for Apache Flink resources. For a user to work with the Managed Service for Apache Flink console, they must be granted the necessary permissions. For example, grant permissions to view streaming sources for application creation, enabling input/output configuration. Amazon-managed policies, such as AmazonKinesisAnalyticsReadOnly or AmazonKinesisAnalyticsFullAccess and custom policies, can grant user permissions. Administrators can specify which principal can perform actions on which resources using policies. The Action element of a JSON policy describes the actions you can use to allow or deny access to a policy.

The following shows an example of a permissions policy that grants permissions to perform operations on Managed Service for Apache Flink resources.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "Stmt1473028104000",
            "Effect": "Allow",
            "Action": [
                "kinesisanalyticsv2:CreateApplication"
            ],
            "Resource": [
                "*"
            ]
        }
    ]
}

To avoid the risk of unauthorized access to resources due to cross-service impersonation, AWS offers tools like service principals to secure data protection across services to mitigate this. For Managed Service for Apache Flink, limit access to trusted resources by utilizing aws:SourceArn and aws:SourceAccount in role trust policies. This policy prevents unauthorized requests and addresses the confused deputy problem.

Below is an example policy that uses both the aws:SourceArn and aws:SourceAccount global condition context keys to protect against the confused deputy problem.

{
   "Version":"2012-10-17",
   "Statement":[
      {
         "Effect":"Allow",
         "Principal":{
            "Service":"managed-flink.amazonaws.com"
         },
         "Action":"sts:AssumeRole",
         "Condition":{
            "StringEquals":{
               "aws:SourceAccount":"Account ID"
            },
            "ArnEquals":{
               "aws:SourceArn":"arn:aws:managed-flink:us-east-1:123456789012:application/my-app"
            }
         }
      }
   ]
}

Conclusion

In this post, we reviewed Amazon Managed Service for Apache Flink. We highlighted key information that can help FSI customers accelerate the approval of the service within these five categories: achieving compliance, data protection, isolation of compute environments, automating audits with APIs, and operational access and security. While not a one-size-fits-all approach, the guidance can be adapted to meet your organization’s security and compliance requirements and provide a consolidated list of critical areas for Amazon Managed Service for Apache Flink.

In the meantime, visit our AWS Financial Services Industry blog channel and stay tuned for more financial services news and best practices.

Mohan CV

Mohan CV

Mohan is a Principal Solutions Architect at AWS, based in Northern Virginia. He has an extensive background in large-scale enterprise migrations and modernization, with a specialty in Data Analytics. Mohan is passionate about working with new technologies and enjoys assisting customers in adapting them to meet their business needs.

Diego Colombatto

Diego Colombatto

Diego Colombatto is a Principal Partner Solutions Architect at AWS. He brings more than 15 years of experience in designing and delivering Digital Transformation projects for enterprises. At AWS, Diego works with partners and customers advising how to leverage AWS technologies to translate business needs into solutions. IT architectures, algorithmic trading and cooking are some of his passions and he's always open to start a conversation on these topics.

Muthuvelan Swaminathan

Muthuvelan Swaminathan

Muthuvelan Swaminathan is an Enterprise Solutions Architect based out of New York. He works with enterprise customers providing architectural guidance in building resilient, cost-effective and innovative solutions that address business needs.