AWS Big Data Blog

Analyzing AWS WAF logs with Amazon OpenSearch, Amazon Athena, and Amazon QuickSight

May 2024: This post was reviewed and updated with latest features.

AWS WAF now includes the ability to log all web requests inspected by the service. AWS WAF can store these logs in an Amazon Simple Storage Service (Amazon S3) bucket in the same AWS Region, but most customers deploy AWS WAF across multiple Regions and accounts—wherever they deploy applications. When analyzing web application security, organizations need the ability to gain a holistic view across all their deployed AWS WAF Regions and accounts.

This post presents a simple approach to aggregate AWS WAF logs into a central data lake repository, which lets teams better analyze and understand their organization’s security posture. We walk through the steps to aggregate Regional AWS WAF logs into a dedicated S3 bucket in the logging account. We follow that up by demonstrating how you can use Amazon OpenSearch Service to visualize the log data. We also present an option to offload and process historical data using AWS Glue. With the data collected in one place, we finally show you how you can use Amazon Athena and Amazon QuickSight to query and visualize historical data and extract business insights.

Solution overview

The case we highlight in this post is the forensic use of AWS WAF access logs to identify distributed denial of service (DDoS) attacks by a client IP address. This solution provides your security teams with a view of all incoming requests impacting every AWS WAF application in your infrastructure.

We investigate what the IP access patterns look like over time and assess which IP addresses access the site multiple times in a short period of time. This pattern suggests that the IP address could be an attacker. With this solution, you can identify DDoS attackers for an application and detect DDoS patterns across your entire global IT infrastructure.

The following diagram illustrates the solution architecture.

This solution requires separate tasks for architecture setup, which allows you to begin receiving log files in a centralized repository, and analytics, which processes your log data into useful results.

Prerequisites

To follow along, you must have the following resources:

  • Two AWS accounts – Following AWS multi-account best practices, create two accounts:
    • A logging account.
    • A resource account that hosts the web applications using AWS WAF. For more information about multi-account setup, see AWS Control Tower. Using multiple accounts isolates your logs from your resource environments. This helps maintain the integrity of your log files and provides a central access point for auditing all application, network, and security logs.
  • The ability to launch new resources into your account – The resources might not be eligible for Free Tier usage and so might incur costs.
  • An application running with an Application Load Balancer or Amazon CloudFront, preferably in multiple Regions – If you don’t already have one, you can launch any AWS web application reference architecture to test and implement this solution.

For this walkthrough, we launch the single-page React app example using the react-cors-spa GitHub repo. This pattern provides a step-by-step approach to coding and hosting a single-page application that’s written in React on Amazon S3 and Amazon CloudFront. You can launch this application in two different Regions to simulate a global infrastructure. You ultimately set up a centralized bucket in a logging account that both Regions log in to, which your forensic analysis tools then draw from. Deploy the AWS CloudFormation stack to launch the sample application in your Region of choice.

Add AWS WAF to the CloudFront distribution

Complete the following steps to enable AWS WAF logs to Amazon S3 in the CloudFront distribution in every Region where the application is deployed. We have deployed in a single Region for simplicity.

  1. On the Amazon S3 console, create an S3 bucket to receive AWS WAF logs from the CloudFront distribution and note the name of the bucket.
  2. In your CloudFront distribution, navigate to Security, Web Application Firewall (WAF).
  3. Choose Edit and choose Enable AWS WAF protections.
  4. Choose Save changes.
  5. In the Security section of your distribution, choose Manage logging in the Request logs for the specified time range
  6. Set Logging destination to S3 bucket and choose the S3 bucket you created.

Repeat this step for every Region where you launched the application that you intend to monitor with AWS WAF.

To handle logging data from multiple AWS WAF logs in multiple Regions from more than one account, you should consider your partitioning strategy for the data. You can grant your security teams a comprehensive view of the network. Create each partition for the specific AWS WAF logs from the Region and account. This partitioning strategy also allows the security team to view the logs by Region and account.

Set up Amazon S3 cross-account replication

You begin this process by providing appropriate permissions for one account to access resources in another. Your resource account needs cross-account permission to access the bucket in the logging account. Complete the following steps:

  1. Create your central logging S3 bucket in the logging account and attach the following bucket policy to it under Permissions (provide your resource account ID and central logging bucket values). Make a note of the bucket’s ARN. You need this information for future steps.
    // JSON document
    {
       "Version":"2012-10-17",
       "Id":"",
       "Statement":[
          {
             "Sid":"Set-permissions-for-objects",
             "Effect":"Allow",
             "Principal":{
                "AWS":"arn:aws:iam::<RESOURCE-ACCOUNT-ID>:role/service-role/resource-acct-IAM-role"
             },
             "Action":["s3:ReplicateObject", "s3:ReplicateDelete"],
             "Resource":"arn:aws:s3:::<CENTRAL-LOGGING-BUCKET>/"
          },
          {
             "Sid":"Set permissions on bucket",
             "Effect":"Allow",
             "Principal":{
                "AWS":"arn:aws:iam::<RESOURCE-ACCOUNT-ID>:role/service-role/resource-acct-IAM-role"
             },
             "Action":["s3:List", "s3:GetBucketVersioning", "s3:PutBucketVersioning"],
             "Resource":"arn:aws:s3:::<CENTRAL-LOGGING-BUCKET>"
          }
       ]
    }

For more information on how to set up permissions for Amazon S3 cross-account access, see Configuring replication when source and destination buckets are owned by different accounts.

  1. Make sure to enable Bucket versioning for both the source and destination S3 bucket.
  2. From your resource account, choose the S3 bucket that receives AWS WAF logs to configure it as your source for replication.
  3. On the Management tab of the replication source bucket, in the Replication rules section, choose Create replication rule.
  4. For Replication rule name, enter a name.
  5. Enable the replication rule by selecting Enabled in the Status
  6. Choose to replicate the entire bucket.

For more information on filtering objects for replication, see Specifying a filter. We have chosen all objects for simplicity.

  1. Choose Browse S3 and choose the destination bucket in your logging account.

When creating new replication rules from the same source bucket, make sure that the AWS Identity and Access Management (IAM) role associated with this configuration has sufficient permissions to write new objects in the new destination bucket.

  1. Provide the resource bucket ARN, central logging bucket ARN, and AWS Key Management Service (AWS KMS) key ARN for your resources in the following IAM role:
    // JSON document
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Action": [
                    "s3:ListBucket",
                    "s3:GetReplicationConfiguration",
                    "s3:GetObjectVersionForReplication",
                    "s3:GetObjectVersionAcl",
                    "s3:GetObjectVersionTagging",
                    "s3:GetObjectRetention",
                    "s3:GetObjectLegalHold"
                ],
                "Effect": "Allow",
                "Resource": [
                    "<RESOURCE-S3-BUCKET-ARN>",
                    "<RESOURCE-S3-BUCKET-ARN>/"
                ]
            },
            {
                "Action": [
                    "s3:ReplicateObject",
                    "s3:ReplicateDelete",
                    "s3:ReplicateTags",
                    "s3:GetObjectVersionTagging",
                    "s3:ObjectOwnerOverrideToBucketOwner"
                ],
                "Effect": "Allow",
                "Condition": {
                    "StringLikeIfExists": {
                        "s3:x-amz-server-side-encryption": [
                            "aws:kms",
                            "aws:kms:dsse",
                            "AES256"
                        ]
                    }
                },
                "Resource": [
                    "<CENTRAL-LOGGING-S3-BUCKET-ARN>/"
                ]
            },
            {
                "Action": [
                    "kms:Decrypt"
                ],
                "Effect": "Allow",
                "Condition": {
                    "StringLike": {
                        "kms:ViaService": "s3.us-east-1.amazonaws.com",
                        "kms:EncryptionContext:aws:s3:arn": [
                            "<RESOURCE-S3-BUCKET-ARN>/"
                        ]
                    }
                },
                "Resource": [
                    "<KMS-KEY-ARN>"
                ]
            },
            {
                "Action": [
                    "kms:Encrypt"
                ],
                "Effect": "Allow",
                "Condition": {
                    "StringLike": {
                        "kms:ViaService": [
                            "s3.us-east-1.amazonaws.com"
                        ],
                        "kms:EncryptionContext:aws:s3:arn": [
                            "<CENTRAL-LOGGING-S3-BUCKET-ARN>/"
                        ]
                    }
                },
                "Resource": [
                    "<KMS-KEY-ARN>"
                ]
            }
        ]
    }
  2. Enable encryption.
  3. Review the replication configuration and choose Save.

Amazon S3 should now begin writing files to your central S3 logging bucket with the correct partitioning. To generate logs, access your web application.

Analytics with OpenSearch Dashboards

AWS WAF provides multiple logging destinations to help you monitor and analyze your web traffic. You can send AWS WAF logs to three different resources: Amazon S3, Amazon CloudWatch, and Amazon Data Firehose.

Each of these destinations offers unique advantages and allows you to further process the log data. For example, you can create patterns to index the data into OpenSearch Service.

The following are the three ways you can integrate AWS WAF logs with OpenSearch Service:

  • Streaming CloudWatch Logs data to OpenSearch Service – When you send AWS WAF logs to CloudWatch, you can create subscription filters that automatically forward the log data to OpenSearch Service. For more details, see Streaming CloudWatch Logs data to Amazon OpenSearch Service.
  • Using Amazon Data Firehose – You can configure Amazon Data Firehose to receive AWS WAF logs and then set OpenSearch Service as the destination for the Firehose stream. This allows you to ingest and index the log data into your OpenSearch cluster. For more details, see Amazon Data Firehose delivery stream.
  • Indexing S3 log files to OpenSearch Service – If you choose to store your AWS WAF logs in an S3 bucket, you can create an AWS Lambda function that gets invoked by S3 event notifications. This function can then extract the log data and index it into your OpenSearch Service instance.

Choosing the right logging destination and integration method depends on your specific requirements, such as the volume of data, real-time analysis needs, and the level of customization you require for your log data processing.

For this post, we use Amazon S3 in order to create a centralized data lake for logs. Now that you have configured replication to collect logs in the central logging S3 bucket, you create an OpenSearch Service domain in the logging account in the same Region as the central logging bucket. A Lambda function is created to handle S3 event notifications as the central logging bucket receives new log files. This creates a connection between your central log files and OpenSearch Service. OpenSearch Service gives you the ability to query your logs quickly to look for potential security threats. The Lambda function loads the data into your OpenSearch Service cluster. OpenSearch Service also allows you to create dashboards to visualize the stored data.

Create an OpenSearch Service domain

Complete the following steps to create an OpenSearch Service domain in your central logging account. For detailed steps, refer to Creating and managing Amazon OpenSearch Service domains.

  1. On the OpenSearch Service console, choose Domains in the navigation pane.
  2. Choose Create domain.
  3. Enter a name for the domain and choose Standard create.
  4. In this demo, choose Public access and disable Fine-grained access control. For any real-world production tasks, keep your OpenSearch Service domain within your VPC.
  5. Leave the remaining settings as default and choose Create.

We later update the access policy to allow the Lambda function access to the OpenSearch Service domain. For this demo, we use default configurations for indexes but it is important to consider Shard Strategy for production environments.

Create a Lambda function to index logs to the OpenSearch Service domain

Next, you create a Lambda function that gets invoked by S3 event notifications when log files are sent to the centralized logging bucket. This Lambda function reads the log file from the centralized logging bucket and stores each log, line by line, into the OpenSearch Service cluster.

This Lambda function depends on additional packages, so you need to create a .zip deployment package with dependencies.

Complete the following steps:

  1. Create a project directory in your local system with the name indexing-logs-to-opensearch.
  2. Navigate to the project directory and create a file called lambda_function.py to serve as the source code file.
  3. Enter the following code in the lambda_function.py  file:
    import json
    import boto3
    import gzip
    import requests
    from requests_aws4auth import AWS4Auth
    import os
    
    s3 = boto3.client('s3')
    region = os.environ['region'] # e.g. us-west-1
    service = 'es'
    credentials = boto3.Session().get_credentials()
    awsauth = AWS4Auth(credentials.access_key, credentials.secret_key, region, service, session_token=credentials.token)
    
    host = host = os.environ['host']
    index = os.environ['index']
    datatype = '_doc'
    url = host + '/' + index + '/' + datatype
    
    headers = { "Content-Type": "application/json" }
    
    
    def unzip_file(gz_file_path, output_file_path):
      try:
        with gzip.open(gz_file_path, 'rb') as gz_file:
          with open(output_file_path, 'wb') as output_file:
            output_file.write(gz_file.read())
        print(f"File unzipped successfully: {output_file_path}")
      except Exception as e:
        print(f"Error during unzip: {e}")
    
    
    def lambda_handler(event, context):
        
        print(event)
        bucket= event['Records'][0]['s3']['bucket']['name']
        key = event['Records'][0]['s3']['object']['key']
        print("Bucket:", bucket)
        print("Key:", key)
        zippedFilename = key.split('/')[-1]
        zippedFilePath = '/tmp/' + zippedFilename
        unzippedFilename = zippedFilename.replace('.gz', '')
        unzippedFilePath = '/tmp/' + unzippedFilename
    
        response = s3.download_file(bucket,key,zippedFilePath)
    
        unzip_file(zippedFilePath, unzippedFilePath)
        with open(unzippedFilePath, 'r') as file:
          for line in file:
            logObject = json.loads(line.strip())
            print("Log Line ", logObject) 
            try:
              r = requests.post(url, auth=awsauth, json=logObject, headers=headers)
              print("Request:", r)
            except Exception as e:
              print(f"Error during indexing: {e}")
    
        
        return {
            'statusCode': 200,
            'body': json.dumps('Success')
        }
  4. Run the following commands in the root directory of the project:
    pip install —target ./package boto3
    pip install —target ./package requests
    cd package
    zip -r ../my_deployment_package.zip .
    cd ..
    zip my_deployment_package.zip lambda_function.py
  5. In your logging account, create a Lambda function with Python 3.12 for Runtime and upload the .zip file you created.
  6. Update the IAM role of the function to have the following IAM policies along with the existing policies assigned to it (provide the OpenSearch Service domain ARN and central logging bucket ARN):
    // JSON document
    {
        "Version": "2012-10-17",
        "Statement": [
            {
                "Sid": "Statement1",
                "Effect": "Allow",
                "Action": [
                    "es:*"
                ],
                "Resource": [
                    "<OPENSEARCH-SERVICE-DOMAIN-ARN>"
                ]
            },
             {
                "Effect": "Allow",
                "Action": [
                    "s3:Get*",
                    "s3:List*",
                    "s3:Describe*",
                    "s3-object-lambda:Get*",
                    "s3-object-lambda:List*"
                ],
                "Resource": "arn:aws:s3:::<CENTRAL-LOGGING-BUCKET>/"
            }
        ]
    }
  7. Set environment variables for your Lambda function so it knows where to send the data:
    1. Add a host and use the endpoint URL you created.
    2. Add an index and enter your index name.
    3. Add a value for region and specify the Region where you deployed your application.
  8. Update the access policy of the OpenSearch service domain with the following code (provide the Lambda function ARN, OpenSearch Service domain ARN, and CIDR range of the source IP):
    {
      "Version": "2012-10-17",
      "Statement": [
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "<LAMBDA-FUNCTION-ARN>"
          },
          "Action": "es:*",
          "Resource": "<OPENSEARCH-SERVICE-DOMAIN-ARN>"
        },
        {
          "Effect": "Allow",
          "Principal": {
            "AWS": "*"
          },
          "Action": "es:ESHttp*",
          "Resource": "<OPENSEARCH-SERVICE-DOMAIN-ARN>",
          "Condition": {
            "IpAddress": {
              "aws:SourceIp": "<CIDR-RANGE-OF-SOURCE-IP>"
            }
          }
        }
      ]
    }

Create an S3 trigger

After you create the Lambda function, create the event trigger on your S3 bucket to invoke that function. This completes your log delivery pipeline to OpenSearch Service.

  1. On the Amazon S3 console, open your bucket.
  2. On the Properties tab, in the Event notifications section, choose Create event notification.
  3. Provide a name for your event.
  4. For the type, select PUT.
  5. Leave the settings for Prefix and Suffix.
  6. For the destination, choose the Lambda function you created.
  7. Choose Save changes.

With this setup, Lambda should start sending data to your OpenSearch Service index whenever the log file comes to the central logging bucket.

Visualize data with OpenSearch Dashboards

Your pipeline now automatically adds data to your OpenSearch Service cluster. Next, use OpenSearch Dashboards to visualize the AWS WAF logs. This is the final step in assembling your forensic investigation architecture.

Using the log data, you can filter by IP address to see how many times an IP address has hit your firewall each month. This helps you track usage anomalies and isolate potentially malicious IP addresses. You can use this information to add web ACL rules to your firewall, which adds extra protection against those IP addresses. The following screenshot shows a pie chart separating the count of requests from specific IP addresses.

In addition to the count of IP addresses, you can create a visualization for the number of IPs over time, and correlate the IP address to its country of origin. Correlation provides even more precise filtering for potential web ACL rules to protect against attackers.

Analytics with Athena and QuickSight

OpenSearch Service is an excellent tool for forensic work because it provides high-performance search capability for large datasets. It is recommended when you have large volumes of semi-structured or unstructured data (logs, documents, and metrics) and need real-time search, analytics, and data exploration capabilities with complex queries and aggregations.

However, you can also use another approach in which you take advantage of AWS serverless technologies like AWS Glue, Athena, and QuickSight. This approach is useful when you have structured or semi-structured data stored in Amazon S3 and need to perform analyses, generate reports, and create visualizations for business intelligence (BI) and reporting purposes.

To learn more about this option, see How to extract, transform, and load data from analytic processing using AWS Glue and Work with partitioned data in AWS Glue.

Query the data with Athena

With your forensic tools now in place, you can use Athena to query your data and analyze the results. You can directly load it into QuickSight for additional visualization. Use the Athena console to experiment until you have the best query for your visual needs. Having the database in your AWS Glue Data Catalog means you can make one-time queries in Athena to inspect your data.

On the Athena console, open the query editor and enter the following query to fetch the client IP addresses over time (provide your database and table name):

SELECT date_format(from_unixtime("timestamp"/1000), '%Y-%m-%d %h:%i:%s') as event_date, httpRequest.clientIp FROM "<YOUR-DATABASE-NAME>"."<YOUR-TABLE-NAME>" limit 10;

It should return the following results.

You can also see which IP addresses impacted your environment the most over any period of time:

SELECT COUNT(httpRequest.clientIp) as count, httpRequest.clientIp FROM "default"."<YOUR-DATABASE-NAME>"."<YOUR-TABLE-NAME>"
GROUP BY httpRequest.clientIp ORDER BY count DESC

Visualize data with QuickSight

Now that you can query your data in Athena, you can visualize the results using QuickSight. First, grant QuickSight access to the S3 bucket where your Athena query results live.

  1. On the QuickSight console, choose the user name menu and choose Manage QuickSight.
  2. Choose Account settings, then Security & permissions.
  3. Under QuickSight access to AWS services, choose Add or remove.
  4. Select Amazon S3, then choose Select S3 buckets.
  5. Choose the output bucket for your central AWS WAF logs.
  6. Choose your Athena query results bucket. The query results bucket begins with aws-athena-query-results-*.

QuickSight can now access the data sources. To set up your visualizations, continue with the following steps:

  1. On the QuickSight console, choose Datasets in the navigation pane.
  2. Choose New dataset.
  3. For Source, choose Athena.
  4. Give your new dataset a name and choose Validate connection.
  5. After you validate the connection, choose Create data source.
  6. Select Use custom SQL and give your SQL query a name.
  7. Input the same query that you used earlier in Athena, and choose Confirm query.
  8. Choose Import to SPICE for quicker analytics, then choose Visualize.

After several minutes, QuickSight alerts you that the import is complete. Now you can apply a visualization.

  1. In the navigation pane, choose Analyses.
  2. Choose New analysis.
  3. Select the last dataset that you created earlier and choose Create analysis.
  4. For the visual type, choose Line chart.
  5. Drag and drop event_date to X-Axis.
  6. Drag and drop client_ip to Value.

This should create a visualization.

  1. Choose the right arrow at the top left of the visualization and choose Hide “other” categories.

This should modify your visualization to look like the following screenshot.

You can map the countries from which the requests originate, allowing you to track global access anomalies. You can do this in QuickSight by selecting the Points on map visualization type and choosing the country as the data point to visualize. You can create the dataset using following query:

SELECT COUNT(httpRequest.country) as count, httpRequest.country FROM "<YOUR-DATABASE-NAME>"."<YOUR-TABLE-NAME>"
GROUP BY httpRequest.country ORDER BY count DESC


You can also add a count of IP addresses to see if you have any unusual access patterns originating from specific IP addresses using the dataset generated by the query:

SELECT httpRequest.clientIp, COUNT(*) AS request_count FROM "<YOUR-DATABASE-NAME>"."<YOUR-TABLE-NAME>"
GROUP BY httpRequest.clientIp ORDER BY request_count DESC 

Conclusion

In this post, we walked through a number of approaches for building operational dashboards that track key metrics over time. Although OpenSearch Service and QuickSight deliver similar end results, the technical approaches they employ have trade-offs worth considering. If your use case involves large volumes of semi-structured or unstructured data (logs, documents, or metrics) and requires real-time search, analytics, and data exploration capabilities with complex queries and aggregations, OpenSearch Service and OpenSearch Dashboards are better suited to meet your needs. On the other hand, the Athena and QuickSight approach is useful when you have structured or semi-structured data stored in Amazon S3 and need to perform one-time analyses, generate reports, and create visualizations for BI and reporting purposes.

The key takeaway is the adaptability of these solutions: combining different AWS services allows you to tailor solutions that precisely match your unique requirements.

For more information and use cases, see the following resources:

We hope you have found this post informative and the proposed solutions intriguing. As always, AWS welcomes all feedback or comments.


About the Authors

Aaron Franco is a solutions architect at Amazon Web Services.

Urvi Sharma is a Solutions Architect at AWS who is passionate about working on edge services. She works with customers in the early stages of cloud adoption to help them migrate and modernize, and build resilient and secure architectures and incorporate AI/ML services with modern technologies like generative AI. Outside of work, Urvi enjoys painting and exploring new places.

Sameer Shrivastava is a Solutions Architect at AWS, where he specializes in serverless technologies. He is enthusiastic about guiding organizations in their cloud transformation journeys to drive efficiency and scalability. When not immersed in the world of cloud architecture, Sameer can be found exploring the great outdoors or indulging in his love for music.

Nishant Dhiman is a Senior Solutions Architect at AWS with an extensive background in serverless, security, and mobile platform offerings. He is a voracious reader and a passionate technologist. He loves to interact with customers and always relishes giving talks or presenting on public forums. Outside of work, he likes to keep himself engaged with podcasts, calligraphy, and music.


Audit History

Last reviewed and updated in May 2024 by Urvi Sharma, Sameer Shrivastava and Nishant Dhiman