AWS Storage Blog
Simplify your data lifecycle by using object tags with Amazon S3 Lifecycle
Managing your storage cost effectively at scale can become complex as you have multiple applications or users using the data with different access patterns and frequency. S3 Lifecycle can help you optimize your storage cost by creating lifecycle configurations to manage your storage spend over time by moving your data to more cost-effective storage classes or expire them based on object age. With large-scale workloads, multi-tenant buckets, and growing numbers of objects, it can become a management burden to create and manage many rules in S3 Lifecycle configurations. In this blog post, we cover how to simplify your data lifecycle management by reducing the the number of rules in an S3 Lifecycle configuration by using object tagging.
How can I simplify my S3 Lifecycle rules in my configuration?
An S3 Lifecycle configuration is a set of rules that define the actions Amazon S3 applies to a group of objects. There are two primary types of actions: transition actions that move objects to another storage class, and expiration actions that delete objects. Customers can define an entire bucket or a subset of objects to transition or expire with rules in the lifecycle configuration.
S3 Lifecycle rules contain filters such as prefixes and object tags to specify the objects eligible for the specific lifecycle action. Each rule can contain one prefix and/or set of object tags. Many workloads use multiple prefixes within an S3 bucket. As the number of distinct prefixes and use cases in your bucket grows, the number of rules you need grows along with it.
Before we learn how to simplify lifecycle rules, let’s first look at the components of an S3 Lifecycle configuration.
What are the components of an S3 Lifecycle configuration?
An S3 Lifecycle configuration has the following elements – ID element, status element, filter element and elements to describe lifecycle actions. S3 Lifecycle configurations can be specified as an XML, consisting of one or more lifecycle rules.
<LifecycleConfiguration>
<Rule>
<ID>Transition and Expiration Rule</ID>
<Filter>
<Prefix>tax/</Prefix>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>365</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
<Expiration>
<Days>3650</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>
Each rule consists of the following:
- Rule metadata that includes a rule ID, and status indicating whether the rule is enabled or disabled. If a rule is disabled, Amazon S3 doesn’t perform any actions specified in the rule.
- Filter identifying objects to which the rule applies. You can specify a filter by using an object key prefix, one or more object tags, or a conjunction of both.
- One or more transition or expiration actions with a date or a time period in the object’s lifetime when you want Amazon S3 to perform the specified action.
For example configurations, see the documentation with examples of lifecycle configurations.
What does an S3 Lifecycle configuration look like with multiple prefixes?
You can specify multiple rules if you want different lifecycle actions of different objects. The following lifecycle configuration has two rules:
- Rule 1 applies to objects with the key name prefix
classA/
. It directs Amazon S3 to transition objects to the S3 Glacier storage class one year after creation and expire these objects 10 years after creation. - Rule 2 applies to objects with key name prefix
classB/
. It directs Amazon S3 to transition objects to the S3 Standard-IA storage class 90 days after creation and delete them one year after creation.
<LifecycleConfiguration>
<Rule>
<ID>ClassADocRule</ID>
<Filter>
<Prefix>classA/</Prefix>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>365</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
<Expiration>
<Days>3650</Days>
</Expiration>
</Rule>
<Rule>
<ID>ClassBDocRule</ID>
<Filter>
<Prefix>classB/</Prefix>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>90</Days>
<StorageClass>STANDARD_IA</StorageClass>
</Transition>
<Expiration>
<Days>365</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>
For each prefix in your bucket, a new lifecycle rule is required for transition and expiration actions for objects within that prefix. Buckets with hundreds of prefixes, as a result, need many rules to set up the appropriate lifecycle actions. In order to limit the overall number of lifecycle rules needed for all of your prefixes, we recommend using object tags.
Specifying a filter based on object tags
In the following example, the lifecycle rule specifies a filter based on a tag (key) and value (value). The rule then applies to the subset of objects with the specific tag.
<LifecycleConfiguration>
<Rule>
<ID>Rule 1</ID>
<Filter>
<Tag>
<Key>tag1</Key>
<Value>value1</Value>
</Tag>
</Filter>
<Status>Enabled</Status>
<Transition>
<StorageClass>GLACIER<StorageClass>
<Days>365</Days>
</Transition>
</Rule>
</LifecycleConfiguration>
You can specify a filter based on multiple tags. Wrap the tags in the <And>
element shown in the following example. The rule directs Amazon S3 to perform lifecycle actions on objects with two tags (with these specific tag keys and values).
<LifecycleConfiguration>
<Rule>
<Filter>
<And>
<Tag>
<Key>key1</Key>
<Value>value1</Value>
</Tag>
<Tag>
<Key>key2</Key>
<Value>value2</Value>
</Tag>
...
</And>
</Filter>
<Status>Enabled</Status>
<Transition>
<StorageClass>GLACIER<StorageClass>
<Days>365</Days>
</Transition>
</Rule>
</LifecycleConfiguration>
The lifecycle rule applies to objects that have both of the tags specified. Amazon S3 performs a logical <And>
operation. Note the following:
- Each tag must match both key and value exactly.
- The rule applies to a subset of objects that has all the tags specified in the rule. If an object has additional tags specified, the rule still applies.
Why should I use object tags?
You can associate multiple key-value pairs (tags) with each of your S3 objects, with the ability to change them at any time. The tags can be used to manage and control access, set up lifecycle rules, customize S3 Storage Class Analysis, and filter CloudWatch metrics. You can think of the bucket as a data lake, and use tags to create a taxonomy of the objects within the lake. This is more flexible than using the bucket and prefixes, and allows you to make semantic-style changes without renaming, moving, or copying objects.
Simplifying your S3 Lifecycle configurations using object tags will be most helpful if you currently have tens or hundreds of rules in your lifecycle configuration filtered through your prefixes. We recommend consolidating those rules by using object tags. To demonstrate the effectiveness of using object tags in your lifecycle configurations, let us take the example of a bucket with the key name prefix configuration and their specific lifecycle action as shown in the following table:
Rule | Filter (Prefix) | Transition action – 1 | Transition action – 2 | Expiration action |
Rule 1 | Prefix 1 | S3 Standard-IA after 45 days | S3 Glacier after 90 days | After 200 days |
Rule 2 | Prefix 2 | S3 Glacier after 90 days | ||
Rule 3 | Prefix 3 | S3 Intelligent-Tiering after 30 days | ||
Rule 4 | Prefix 4 | S3 Standard-IA after 90 days | After 200 days | |
Rule 5 | Prefix 5 | S3 Intelligent-Tiering after 90 days | After 200 days | |
Rule 6 | Prefix 6 | S3 Standard-IA after 45 days | After 200 days | |
Rule 7 | Prefix 7 | S3 Glacier after 90 days | ||
Rule 8 | Prefix 8 | S3 Intelligent-Tiering after 30 days | ||
Rule 9 | Prefix 9 | S3 Standard-IA after 90 days | After 200 days | |
Rule 10 | Prefix 10 | S3 Intelligent-Tiering after 90 days | After 200 days | |
Rule 11 | Prefix 11 | S3 Standard-IA after 45 days | After 200 days | |
Rule 12 | Prefix 12 | S3 Glacier after 90 days | S3 Glacier Deep Archive after 200 days | |
Rule 13 | Prefix 13 | S3 Intelligent-Tiering after 30 days | ||
Rule 14 | Prefix 14 | S3 Standard-IA after 90 days | After 200 days | |
Rule 15 | Prefix 15 | S3 Intelligent-Tiering after 90 days | After 200 days | |
Rule 16 | Prefix 16 | S3 Standard-IA after 45 days | After 200 days | |
Rule 17 | Prefix 17 | S3 Glacier after 90 days | ||
Rule 18 | Prefix 18 | S3 Intelligent-Tiering after 30 days | ||
Rule 19 | Prefix 19 | S3 Standard-IA after 90 days | After 200 days | |
Rule 20 | Prefix 20 | S3 Intelligent-Tiering after 90 days | After 200 days |
Notice that there are 20 different prefixes with lifecycle actions, and as a result, the lifecycle configuration will need 20 different rules if the only filter element is a prefix. We can reduce the number of rules significantly by using object tags, each defined for every unique lifecycle action.
Analyzing this specific example, we recommend creating six different object tags, one for each unique lifecycle action:
Tag – Key | Tag – Value | Lifecycle action for the Tag |
TransitionInfrequent | 45 | Transition tagged objects to S3 Standard-IA after 45 days |
TransitionArchive | 90 | Transition tagged objects to S3 Glacier after 90 days |
TransitionIntelligent | 30 | Transition tagged objects to S3 Intelligent-Tiering after 30 days |
TransitionIntelligent | 90 | Transition tagged objects to S3 Intelligent-Tiering after 90 days |
TransitionDeepArchive | 200 | Transition tagged objects to S3 Glacier Deep Archive after 200 days |
Expiration | 200 | Expire tagged objects after 200 days |
We create one tag for each unique transition element and one tag for each unique expiration element. Objects that only need to be transitioned OR expired need only one of the tags. Objects that expire after transition should be tagged with both transition and expiration element tags.
For example, objects in prefix 3 that only transition to S3 Intelligent-Tiering after 30 days need only one tag. However, objects in Prefix 1, which have both transition and expiration actions, need both of those tags.
As a result, our new and improved lifecycle configuration hasthe following structure:
Rule | Filter (Tag – Key Value) | Transition action | Expiration action |
Rule 1 | Key – TransitionInfrequent Value – 45 |
S3 Standard-IA after 45 days | After 200 days |
Rule 2 | Key – TransitionArchive Value – 90 |
S3 Glacier after 90 days | |
Rule 3 | Key – TransitionIntelligent Value – 30 |
S3 Intelligent-Tiering after 30 days | |
Rule 4 | Key – TransitionInfrequent Value – 45ANDKey – TransitionArchive Value – 90ANDKey – Expiration Value – 200 |
S3 Standard-IA after 45 days, then S3 Glacier after 90 days | After 200 days |
Rule 5 | Key – TransitionArchive Value – 90ANDKey – TransitionDeepArchive Value – 200 |
S3 Glacier after 90 days, then S3 Glacier Deep Archive after 200 days | |
Rule 6 | Key – TransitionInfrequent Value – 90ANDKey – Expiration Value – 200 |
S3 Standard-IA after 90 days | After 200 days |
Rule 7 | Key – TransitionIntelligent Value – 90ANDKey – Expiration Value – 200 |
S3 Intelligent-Tiering after 90 days | After 200 days |
We have simplified the lifecycle configuration by reducing the number of rules. As you add new datasets that need similar transition and expiration policies, you can tag them based on their retention periods. As there is a limit of 1000 rules per bucket, finding ways to reduce your lifecycle rules will help when managing large shared datasets.
Great, so how do I get started?
To get started on replacing your lifecycle rules to use object tags, we recommend three steps: automate adding objects tags for your objects in your application, add object tags to your current objects based on their lifecycle, and finally changing the lifecycle configurations with new rule filters.
Step 1 – Automating object tags to future objects
Object tagging works with many Amazon S3 API operations. For example, you can specify tags when you create objects, and the tagging action itself is free of charge when added as a part of the PutObject request. You specify tags using the x-amz-tagging
request header.
Alternatively, you could add an AWS Lambda trigger that adds the tags to the object when uploaded. Adding tags via Lambda would incur additional Lambda and S3 request fees.
Step 2: Applying object tags to existing objects
You can add object tags straight from the console on individual objects or use S3 Batch Operations to add or replace object tags to millions of objects. For example, using S3 Inventory reports for multiple prefixes, you can generate prefix-level manifests and then use S3 Batch Operations to add appropriate tags to each prefix. In the preceding example, the S3 Inventory report manifest for prefix 1 can be used as an input for S3 Batch Operations job to add the tag “SIA45,” which can then be used in the lifecycle configuration to transition to S3 Standard-IA storage class after 45 days since the object was created.
Step 3: Changing your S3 Lifecycle configuration to include object tags as filters
The following is an example of the prefix structure for the first table, the XML input of the lifecycle configuration only using prefixes as the filter element looks like this:
<LifecycleConfiguration>
<Rule>
<ID>Rule1</ID>
<Filter>
<Prefix>Prefix1/</Prefix>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>45</Days>
<StorageClass>STANDARD_IA</StorageClass>
</Transition>
<Expiration>
<Days>200</Days>
</Expiration>
</Rule>
<Rule>
<ID>Rule2</ID>
<Filter>
<Prefix>Prefix2/</Prefix>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>90</Days>
<StorageClass>GLACIER</StorageClass>
</Transition>
</Rule>
...
...
...
...
<Rule>
<ID>Rule20</ID>
<Filter>
<Prefix>Prefix20/</Prefix>
</Filter>
<Status>Enabled</Status>
<Transition>
<Days>90</Days>
<StorageClass>INTELLIGENT_TIERING</StorageClass>
</Transition>
<Expiration>
<Days>200</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>
After tagging all the objects in these prefixes using step 1 and step 2, the new lifecycle configuration looks like this:
<LifecycleConfiguration>
<Rule>
<ID>Rule 1</ID>
<Filter>
<And>
<Tag>
<Key>TransitionInfrequent</Key>
<Value>45</Value>
</Tag>
<Tag>
<Key>Expiration</Key>
<Value>200</Value>
</Tag>
<And>
</Filter>
<Status>Enabled</Status>
<Transition>
<StorageClass>STANDARD_IA<StorageClass>
<Days>45</Days>
</Transition>
<Expiration>
<Days>200</Days>
</Expiration>
</Rule>
...
...
...
...
<Rule>
<ID>Rule 7</ID>
<Filter>
<And>
<Tag>
<Key>TransitionIntelligent</Key>
<Value>90</Value>
</Tag>
<Tag>
<Key>Expiration</Key>
<Value>200</Value>
</Tag>
<And>
</Filter>
<Status>Enabled</Status>
<Transition>
<StorageClass>INTELLIGENT_TIERING<StorageClass>
<Days>90</Days>
</Transition>
<Expiration>
<Days>200</Days>
</Expiration>
</Rule>
</LifecycleConfiguration>
As a result of the consolidation, we have successfully reduced the number of rules in the lifecycle configuration from 20 to just 7.
Is there anything else I should know?
There are a couple of things to be careful of while consolidating your lifecycle rules with object tags.
- Adjusting your applications to tag objects during PUT operations helps you create the tags without a charge. Replacing or adding new tags to your existing objects will incur standard costs for tagging. Tags cost $0.01 per 10,000 tags per month. Requests that add or update tags (PUT and GET, respectively) are charged at the Tier 1 request rates. For more information, see the Amazon S3 pricing page.
- When tagging multiple objects from a manifest using Batch Operations, changes are made to the full set of tags rather than individually. As a result, Batch Operations replaces any existing tags to the objects. For more information on replacing all tags, please refer to the documentation.
- When you have multiple rules in an S3 Lifecycle configuration, an object can become eligible for multiple lifecycle actions. In such cases, Amazon S3 follows these general rules: permanent deletion takes precedence over transition and transition takes precedence over creation of delete markers. For example, when an object is eligible for both a S3 Glacier and S3 Standard-IA (or S3 One Zone-IA) transition, Amazon S3 chooses the Amazon S3 Glacier transition. For examples, see the documentation on overlapping filters, conflicting lifecycle actions, and what Amazon S3 does.
- When specifying the
AbortIncompleteMultipartUpload
orExpiredObjectDeleteMarker
lifecycle actions, the rule cannot specify a tag-based filter. We recommend turning these on at the bucket level to optimize your storage further and improve performance. - You can associate up to 10 tags with an object. Tags that are associated with an object must have unique tag keys. A tag key can be up to 128 Unicode characters in length, and tag values can be up to 256 Unicode characters in length. The key and values are case-sensitive. For more information about tag restrictions, see the documentation on user-defined tag restrictions.
Conclusion
In this post, we demonstrated how you can use object tags to reduce and consolidate your S3 Lifecycle rules. In particular, this helps you simplify how you manage your data lifecycle by analyzing your current S3 Lifecycle configuration, identify common lifecycle actions to multiple prefixes, and use object tags to tag all objects across different prefixes with common lifecycle actions. As you scale your applications, your datasets increase. When objects are tagged based on their retention needs, S3 Lifecycle can automatically transition or expire them based on your configuration. We hope you can use the examples covered in this blog post to optimize the number of rules in your S3 Lifecycle configuration across your accounts and buckets to optimize your storage costs and simplify your data management.
Thanks for reading this post and using S3 Lifecycle to manage your objects in Amazon S3. If you have any comments, questions, or feedback, please leave a comment in the comments section.