AWS Cloud Operations Blog
Use Amazon Athena and AWS CloudTrail to estimate billing for AWS Config rule evaluations
AWS Config is a service that enables you to audit your AWS resources for compliance to a desired configuration state. You are billed based on the number of Configuration Items (a point-in-time snapshot of an AWS resource) recorded and the number of AWS Config rules (a function that reports resource compliancy) evaluated per resource per month.
In this post, I’ll show you how to use Amazon Athena to query AWS CloudTrail logs for a detailed billing breakdown of AWS Config rule evaluations. This breakdown can help determine which rules are contributing the most to monthly costs so you can optimize accordingly (for example, by reducing the frequency of the running of certain rules). I don’t dive into configuration items in this post. For more information about that, see the Identifying resources with the most configuration changes using AWS Config blog post.
Step 1: Create an Athena table to query CloudTrail logs
Create an Athena table and configure it with the location and schema of the CloudTrail logs. The easiest way to do this is to use the auto-generated statement available in the CloudTrail console.
From the left navigation pane, choose Event history, and then choose Create Athena table.
Figure 1: Event history page in the CloudTrail console
On the Create a table in Amazon Athena page, choose your CloudTrail log. For Storage location, choose the S3 bucket that contains the CloudTrail log files. The table name and table comment placeholders (indicated by the square brackets) will be auto-populated for you.
Figure 2: Create a table in Amazon Athena
You can either choose Create table or copy this statement and execute it in the Athena console, if you need to make modifications to the query.
Your CREATE TABLE
statement should look similar to the following, which has a modified table name (LINE 1) and table comment (LINE 46):
Step 2: Create a simple query of CloudTrail data
Use the following query to ensure that the Athena table was created correctly. Replace [your_cloudtrail_table]
(LINE 2) with the chosen table name from the previous step. In the example above, the table name was cloudtrail_management_logs
.
If the table was correctly created, you will see 10 results. This is the beginning of the full query, which you will expand on later in the post. If you read through the query, you can see it’s filtering on two attribute values: config.amazonaws.com
(events that originated from AWS Config) and PutEvaluations
(events that are recording a compliance state of a resource).
Notice two columns retrieved from the query: requestparameters
and additionaleventdata
. We will extract data from these JSON objects in the next steps. For a detailed explanation of the properties in these objects, please reference the PutEvaluations API and ConfigRule API documentation respectively.
Step 3: Query the number of AWS Config rule evaluations by rule name
Use the following query to retrieve the number of AWS Config rule evaluations, sorted by highest to lowest number of evaluations, grouped by rule name.
Replace the table name placeholder (LINE 6) with the name of your Athena table.
If successfully executed, you will receive results similar to the following:
Let’s break down the query:
LINE 3: Uses the json_extract
function to extract configRuleName
from additionaleventdata
using a JSONPath expression ('$.configRuleName'
).
LINE 4: Similar to LINE 3, extracts evaluations from requestparameters
using json_extract
and casts it as an ARRAY<MAP>
.
LINE 16: Uses the UNNEST
operator to expand the evaluations array, extracted in LINE 4, into individual elements.
LINE 18: Filters out evaluations that are NOT_APPLICABLE
, because they do not contribute to costs.
LINE 22: Counts the number of evaluations of each rule.
The query also contains a LIMIT
clause (LINE 9) because, by default, Athena will query all CloudTrail log files in the specified S3 bucket. Depending on how large the bucket is, it might take a long time to execute this query. Feel free to experiment by removing the LIMIT
clause to see if the query times are acceptable to you. (It still might take more than one minute to execute the query.)
It is fine to scan all files in a S3 bucket for experimenting with Athena queries. However, for long-term use, it’s more performant and cost-effective to partition your data. When you use Amazon Athena, you are charged by the amount of data that you scan. For more information, see the Amazon Athena pricing page.
Steps 4-6 will describe how to create and query partitions. If you don’t want to do this, you can skip ahead to Step 7.
Step 4: Create a new table with partition keys
Partitioning Athena tables means creating logical groups that restrict the amount of data scanned by each query. A common best practice is to partition by date or time. If you look at the CREATE TABLE
statement in Step 1 of this post, the LOCATION
clause looks like the following:
If you browse the objects in that S3 bucket, you’ll see they are organized into AWS Regions, years, months, and days. They are already physically partitioned. See Figure 3. The Regions in your account might be different.
However, because these physical partition locations are not registered in Athena, by default every query will go through all folders.
Figure 3: CloudTrail S3 bucket Region folders
To partition your table, you will first need to define partition keys using an additional clause in the create table statement. Use the same CREATE TABLE
statement from Step 1, but add a PARTITIONED BY
clause as seen in the example below (LINE 47). You’ll see that I also added a _partitioned
suffix to the table name to differentiate it from the unpartitioned table (LINE 1).
LINE 47: There are three partitions: region
, year
, and month
. You can create more partition keys (for example, account ID and day) as appropriate for your use case.
Note: Where the PARTITIONED BY
clause is entered (after COMMENT
), an incorrect clause order might cause a syntax error.
Once you’ve entered your partition keys, you can run the statement within the Athena console to create your partitioned table.
Step 5: Register partition locations
After creating the partitioned Athena table, you can register each partition and its respective S3 bucket location, using an ALTER TABLE
statement.
The following example statement will register a partition. Replace the placeholders with your values.
In this statement, I logically registered 'us-east-1'
, '2021'
, '01'
to a physical S3 location. Using this statement as a template, you can register as many partitions as you need.
Step 6: Query using partitions
Now you can modify the query in Step 3, updating the WHERE
clause to limit the search to specific partitions. Your query will look like the following example. Modify the table name (LINE 6) and partitions (LINES 9 to 11) with your own values.
When you execute this query, you’ll see that it runs much faster and scans less data overall.
Step 7: Estimate cost
With the query successfully returning the number of rule evaluations per AWS Config rule, you can expand the SELECT
statement to include a cost column:
count(compliance_type)* 0.001 as cost
The 0.001
($0.001 per evaluation) is based on the lowest pricing tier from current AWS Config pricing. Your total number of evaluations may put you on a different pricing tier. You should update this value accordingly. The entire SELECT
statement (including cost) will look like this:
Note: Due to factors such as tiered pricing, these are estimated costs. Also remember that Custom rules are executed as AWS Lambda functions in your account and standard Lambda execution costs also apply. These additional costs are not considered within the query above.
Enhancements and next steps
Here are some ideas to get started:
Visualize the queries using Amazon QuicksightUsing the same query, you can create an Amazon QuickSight dashboard to visualize this data over time. In this example, I visualized the month-over-month change in rule evaluations, the estimated cost for the last three months, and the number of evaluations per day in the last month.
|
Figure 4: QuickSight dashboard graphs |
Build queries for recorded configuration itemsYou can build similar queries to determine the number of recorded Configuration Items (and to estimate their cost). Configuration items are stored in a different S3 bucket. For more information, see the Identifying resources with the most configuration changes using AWS Config blog post. In this example, I visualized the number of recorded Configuration Items, grouped by resource type, in the last month.
|
Figure 5: QuickSight dashboard pie chart |
Automate partition creation
In this post, you registered partitions manually (Step 5). You can automate the registration of partitions by creating a custom Lambda function and scheduling it to execute the ALTER TABLE
statement every month.
Conclusion
In this blog post, I’ve demonstrated how you can use Amazon Athena to query AWS CloudTrail logs and obtain a count of AWS Config rule evaluations by rule name. Now go ahead and experiment! Athena is a powerful tool that can help you gather all sorts of insights on your data without the need to transform it.