AWS Security Blog

Tracking Federated User Access to Amazon S3 and Best Practices for Protecting Log Data

Auditing by using logs is an important capability of any cloud platform.  There are several third party solution providers that provide auditing and analysis using AWS logs.  Last November AWS announced its own logging and analysis service, called AWS CloudTrail.  While logging is important, understanding how to interpret logs and alerts is crucial.  In this blog post, Aaron Wilson, an AWS Professional Services Consultant, explains in detail how to interpret S3 logs within a federated access control context.

Introduction

Amazon S3 provides an optional feature named Server Access Logging, which records information about requests to your objects and includes details such as the requester, bucket name, request action, response status, and more.  Access logs are useful when troubleshooting applications and also complement the logs provided by AWS CloudTrail.

I’ve divided this post into two parts.  The first part describes the S3 logs that are created when a federated user accesses S3 buckets using the AWS Management Console.  These S3 logs can contain sensitive information, and should be protected from unauthorized viewing.  As a best practice, you want to limit who can modify or delete the logs to prevent a user from covering their tracks.  Therefore, in the second part, I provide a recommended approach on how to protect these logs.

I should also mention that throughout the post I use an example of a SAML-based federated user accessing S3 using the AWS Management Console, as this is a popular use case for many customers.  Similar results can be obtained through other AWS-supported federation techniques and access methods such as API federation using temporary security credentials.  Before reading further, I suggest that you read a recent AWS security blog post where we learn about “Bob”, a federated user via IAM SAML 2.0 support.  The context will help a lot further down in this post.

Ready to do some log diving? Let’s dig in.

Anatomy of the S3 Access Logs

Let’s take a look at a sample S3 access log generated when Bob downloads an object through the AWS Management Console from his laptop.

2dee2a767bc4b3c186ac1423c785b932f07357c659d315eeb3d8a8bcd037ca76 mybucket
[26/Nov/2013:22:01:39 +0000] 10.1.2.3 arn:aws:sts::123456789012:assumed-
role/ADFS-Dev/bob@example.com 0F81ECBB712F2F79 REST.GET.OBJECT
files/secretformula.pdf "GET
/mybucket/files/secretformula.pdf?AWSAccessKeyId=ASIAEXAMPLECDYXRAUMQ&Expires=1
385503597&Signature=hcxmv/mLM6UiE2aFPa1KQJM6Q2Q%3D&x-amz-security-
token=AQoDYXdzEGca0AMOxX9sQBY13KpRt6%2Bcb30YEEF1BZ8p4FHigLX9GYWoBnNaJ7XVXaPSYyv
mhpxntLPjgM964DnYve5mp8ScTM0tSGIRrE/PxMAXvp6MiNgAbiRZktyolt7kU7/7Vu8e7WcnGjMJkE
FKY7q8vm4mo198MVF/04Q1/vNlm6Rr5jAbjvgXcKrBv%2BnTsEBSytyVDQl31P2Yx74ZAgcEp7CMs35
1u5Juc%2BeMznVvvzapVP9%2BjX6fuM8NYzuylax/CPZq3zH6JJVnZrD08SnwYng4sir6tUAYyJosyi
Q2QkmWniojRukRyOl8efgJfD0iYRLfqgMndj7i/TwOH8lf3PJ2P2WF7RmVa3DZyZt30N4Oj0dVT/7PL
qNqxM2hhl0fOfnic8219aduQBRuiD3mkkZMQtA1NbQ5RkzvRQhA7v8naNEhSrfNBdDWwXM6V1PG6kUo
lpTaxxLitipQaG4GYpPHmE5Wn4h4bbM4ykOZe/7yVIp9ebG4X8FAK8jOaJFcwBmegiBwYHT0X9BKKN/
vHDbNHV5OMXKxBIxCkC9Ui84v4uQxLem2uRaVACiEUhM2cjdc3LHcRmds2rQ6SppPsyFLT/NsNo6RwT
MDx/EKQUSVwOk15iCTtNSUBQ%3D%3D HTTP/1.1" 200 - 2523 2523 275 274 "https://s3-
console-us-standard.console.aws.amazon.com/GetResource/Console.html?region=us-
east-1&pageLoadStartTime=1385503264134&locale=en" "Mozilla/5.0 (Macintosh;
Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko)
Chrome/31.0.1650.57 Safari/537.36" -

Using the documentation for S3 Server Access Log Format, I will now decode the fields in this log entry:

Bucket Owner 2dee2a767bc4b3c186ac1423c785b932f07357c659d315eeb3d8a8bcd037ca76
Bucket mybucket
Time [26/Nov/2013:22:01:39 +0000]
Remote IP 10.1.2.3
Requester arn:aws:sts::123456789012:assumed-role/ADFS-Dev/bob@example.com
Request ID 0F81ECBB712F2F79
Operation REST.GET.OBJECT
Key files/secretformula.pdf
Request-URI “GET /mybucket/files/secretformula.pdf?AWSAccessKeyId= ASIAEXAMPLECDYXRAUMQ&Expires=
1385503597& Signature=hcxmv/mLM6UiE2aFPa1KQJM6Q2Q%3D&x-amz-security-token=AQoDYXdzEGca0AMOxX9sQBY13KpRt6 %2Bcb30YEEF1BZ8p4FHigLX9GYWoBnNaJ7XVX
aPSYyvmhpxntL PjgM964DnYve5mp8ScTM0tSGIRrE/PxMAXvp6MiNgAbiRZktyolt7kU7/7Vu8e7Wcn
GjMJkEFKY7q8vm4mo198MVF/04Q1/vNlm6Rr5jAbjvgXcKrBv%2BnTsEBSytyVDQl31P2Yx74ZAgcE
p7CMs351u5Juc%2BeMznVvvzapVP9%2BjX6fuM8NYzuylax/CPZq3zH6JJVnZrD08SnwYng4sir6tU
AYyJosyiQ2QkmWniojRukRyOl8efgJfD0iYRLfqgMndj7i/TwOH8lf3PJ2P2WF7RmVa3DZyZt30N4Oj0
dVT/7PLqNqxM2hhl0fOfnic8219aduQBRuiD3mkkZMQtA1NbQ5RkzvRQhA7v8naNEhSrfNBdDWwX
M6V1PG6kUolpTaxxLitipQaG4GYpPHmE5Wn4h4bbM4ykOZe/7yVIp9ebG4X8FAK8jOaJF
cwBmegiBwYHT0X9BKKN/vHDbNHV5OMXKxBIxCkC9Ui84v4uQxLem2uRaVACiEUhM2cjdc3LHc
Rmds2rQ6SppPsyFLT/NsNo6RwTMDx/EKQUSVwOk15iCTtNSUBQ%3D%3D HTTP/1.1”
HTTP status 200
Error Code  —
Bytes Sent 2523
Object Size 2523
Total Time (ms) 275
Turn-Around Time (ms) 274
Referrer “https://s3-console-us-standard.console.aws.amazon.com/GetResource/Console.html?region=us-east-1&pageLoadStartTime=1385503264134&locale=en”
User-Agent “Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/31.0.1650.57 Safari/537.36”
Version ID  —

Interpretation of the S3 Access Logs

Now we’ll take a deeper look at some of the specific fields above:

 Bucket mybucket
 Time [26/Nov/2013:22:01:39 +0000]
 Remote IP 10.1.2.3
 Requester arn:aws:sts::123456789012:assumed-role/ADFS-Dev/bob@example.com
 Key files/secretformula.pdf

We have the name of the Bucket, and the Time the object was accessed (in UTC as indicated by the +0000 time zone offset), and the Remote IP of the user.  The Key field is a concatenation of the S3 prefix (files/) and the name of the object accessed (secretformula.pdf).

The Requester field has a lot of information relevant to our use case.  Let’s break it down further to see what can be learned:

assumed-role  Indicates that the user has assumed an IAM role
ADFS-Dev  The name of the IAM role that the user assumed
bob@example.com  The name of the user who is requesting access to the resource

In the blog post I referred to earlier, we chose to map the user’s email address to the mandatory attribute of RoleSessionName.  Technically this could be any unique string 2-32 characters in length.  In this example log we see bob@example.com, which is the email address for a user in a Windows Active Directory domain who is accessing AWS resources through use of SAML 2.0 federation by assuming the ADFS-Dev IAM role.  Using the email address is convenient because auditors and security operations analysts don’t have to correlate the ADFS or Active Directory logs with S3 logs in order to determine the original user.  In general, you should always consider choosing an identifier that you can easily correlate to an end user. Since SAML was configured using a uniquely identifiable record (Bob’s email address), we can easily track this request back to a real person without having to perform correlation of any other logging sources.  In addition we are provided with contextual information to determine which role was assumed.  We can determine Bob’s permissions to AWS resources by simply examining the permissions defined in the IAM role ADFS-Dev.

In summary, the salient points for an auditor are:  At 22:01:39 UTC on 26/Nov/2013, bob@example.com accessed secretformula.pdf in the mybucket bucket from IP 10.1.2.3.  This access was permitted by the IAM role ADFS-Dev.

Protecting Log Data

We’ve now learned how to track user access to objects.  Since we’re on the topic of log data, let’s review how to protect the logs from unauthorized viewing and modification.

AWS provides a variety of controls you can use to protect the logs, as explained in our previous blog post, IAM policies and Bucket Policies and ACLs! Oh My! (Controlling Access to S3 Resources).  You can provide log integrity by:

  • Limiting read access to the access logs only by authorized personnel
  • Using a condition to require all clients to use HTTPS when transferring data from this bucket
  • Using a condition to limit client IP addresses to specific CIDR blocks

Below is an example IAM policy that could be attached to a group of users who have a valid reason to view the S3 access logs.  This policy supports read-only access via the AWS Management Console, AWS CLI, or API calls.  More conditions could be used as necessitated by your information security policy, such as restricting the source IP subnet, time of day logs can be pulled, or perhaps specifying the user agent to require an approved retrieval method.  This policy could also be used on an IAM role if all of your users are federated.  Finally, note that the example bucket name and prefix, “myloggingbucket/logs/*”, need to be changed to suit your environment.

{
  "Statement": [
    {
      "Sid": "AllowGetLogs",
      "Action": [
        "s3:GetObject"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::myloggingbucket/logs/*",
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "true"
        }
      }
    },
    {
      "Sid": "ConsoleAccess",
      "Action": [
        "s3:GetBucketLocation",
        "s3:ListAllMyBuckets"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::*",
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "true"
        }
      }
    },
    {
      "Sid": "AllowCliListBucket",
      "Action": [
        "s3:ListBucket"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:s3:::myloggingbucket",
      "Condition": {
        "Bool": {
          "aws:SecureTransport": "true"
        }
      }
    }
  ]
}

For more details on creating access policies, see Overview of AWS IAM Policies.

You can also demonstrate the integrity of the original machine-generated logs by creating an access policy requiring the use of a multifactor authentication (MFA) device to delete log files – perhaps assigned to different users for separation of duties.

This scenario is discussed in the blog series “Securing access to AWS using MFA“.

Further, you can use lifecycle rules to archive the logs to Glacier as dictated by your log data retention policy.  For details, see “Object Lifecycle Management“.

Conclusion

You have learned how S3 access logs can be used to track user access to sensitive files in Amazon S3.  When using federation, the email address for the federated user could persist into the Amazon S3 access logs, so no additional steps are required to track the user access across the federated environment.  You also learned how to protect the logs from unauthorized viewing or modification.