AWS Database Blog
How to use DynamoDB global secondary indexes to improve query performance and reduce costs
Whether you’re an experienced Amazon DynamoDB user or just starting out, you’re probably familiar with the basics of the DynamoDB primary key model. This key model provides consistent single-digit millisecond performance at any scale. If your application needs efficient access to data using attributes other than the primary key, that’s where global secondary indexes come in handy. As an example, take querying an employee table that has a primary key of employee_id
to find all employees in a particular department.
However, global secondary indexes are capable of more than multiple attribute queries—they’re actually one of the most versatile tools in DynamoDB. Recently, there’s been a limit increase on the maximum number of global secondary indexes per table from 5 to 20. Given that, now is a great time to learn how to use them to optimize your DynamoDB usage.
Think of a global secondary index as a separate DynamoDB table that contains a subset of attributes from the source table. It also contains an alternate primary key to support query operations. Changes made to the source table are propagated to the global secondary index with eventual consistency (similar to a materialized view in a relational database). This happens without affecting the performance or availability of the source table. For more information, see Global Secondary Indexes.
In this post, I demonstrate several ways to use global secondary indexes to query your data, accelerate your application’s performance, and reduce your monthly DynamoDB bill.
Usage patterns for global secondary indexes
Some common usage patterns for global secondary indexes follow.
Querying and sorting data by multiple attributes
Querying and sorting data by multiple attributes is the most common usage pattern when creating global secondary indexes. With this usage pattern, you create a global secondary index on an attribute that is not part of the table’s primary keys to support queries on that attribute.
Global secondary indexes support three types of queries:
- Filtered data retrieval: Query data by the partition key and an optional sort key. You can create global secondary indexes on any table attributes with the filtering attribute as its primary key. Creating a global secondary index avoids scanning the full table and then filtering data.
- Sorted data retrieval: Sort data on the sort key for global secondary indexes with both a partition key and a sort key. Global secondary index query results are always sorted by the sort key value. Any attribute requiring sorted data retrieval can be made the sort key of the global secondary index. The order of query results can be controlled using the ScanIndexForward parameter in the query.
- Range-based data retrieval: Include queries such as
BEGINSWITH
,BETWEEN
,GREATER THAN
, andLESS THAN
on the sort key of the global secondary index. Global secondary index queries provide support for KeyConditionExpressions that enable range-based data retrieval.
Let’s say an online music streaming service has a Songs
table storing information about all the songs the service offers. The information stored about a song includes SingerId
, SingerName
, Genre
, LaunchYear
, Album
, SongDuration
, NumberOfViews
, and the NumberOfLikes
of the song. The table’s partition key is SongId
, which is a random string. The SongId
uniquely identifies each song. The table doesn’t have a sort key because there is exactly one record per SongId
. The following table shows the schema of the database table.
The following table contains a list of queries needed for the music streaming service’s application. For each, it includes the candidate primary key attribute for the global secondary index and the rationale for using it.
Query | Candidate Primary Key Attributes for the Global Secondary Index |
Rationale | |
Partition Key | Sort Key | ||
Find all songs for a SingerId |
SingerId |
You don’t need a sort key because the partition key is sufficient to answer the query. | Global secondary indexes are optimized for uniform key distribution of items across its partitions. Keeping |
Find all songs of a genre ordered by number of likes | Genre |
NumberOfLikes |
Because global secondary indexes store data in the order of the sort key, keeping NumberOfLikes the sort key lets you retrieve data sorted by NumberOfLikes . |
Find most-viewed songs of a SingerId launched in 2010 |
SingerId |
LaunchYear:Views |
Keeping Concatenating multiple attributes in the sort key allows querying data on multiple attributes. It also keeps the data sorted in the order of the concatenated sort key. In this case, using |
Find all songs of a singer 2010-2014 | SingerId |
LaunchYear |
SingerId as the partition key provides support for querying on it. Having LaunchYear as the sort key lets you do range queries on it using BETWEEN KeyConditionExpression. |
Data filtering
Sometimes you need to filter the data you queried from your table or an existing global secondary index before using it in your application. In such cases you will incur the cost for reading all the items that you read before the filtering is applied.
The data filtering pattern is useful when the read cost of querying your table or using an existing global secondary index is higher than the write cost of maintaining an already-filtered view of the data as a new global secondary index.
Let’s say a food delivery service stores pending orders in a DynamoDB table called Orders
. The table’s partition key is RestaurantId
and its sort key is OrderId
. The status of orders can be FoodInPreparation
, ReadyForPickup
, or OutForDelivery
. The following table shows the schema of this DynamoDB table.
The delivery service needs to assign orders with the OrderStatus
of ReadyForPickup
so that delivery drivers know when orders are ready to be picked up. The table already supports querying all open orders for a restaurant by RestaurantId
, which is the partition key with the filter clause of OrderStatus
as ReadyForPickup
.
Suppose that the delivery service queries all open orders every 30 seconds to minimize customer wait time. In this case, querying the table leads to significantly more items being read as opposed to maintaining a filtered view of orders that are in ReadyForPickup
status. This is because a number of items will be in the FoodInPreparation
and OutForDelivery
states, which are being read from the table unnecessarily.
For this food delivery service, the following table includes possible global secondary index schemas the food delivery service can use to store a filtered view in the global secondary index along with the recommended schema.
Query
|
Candidate Primary Key Attributes for Global Secondary Index |
Rationale | Recommendation | |
Partition Key | Sort Key | |||
Find orders for a RestaurantId with OrderStatus of ReadyForPickup |
RestaurantId |
ReadyForPickup |
Keeping Maintaining a separate flag for |
This is the recommended schema for the global secondary index. It results in well-distributed data across global secondary index partitions and also reduces read and write costs by storing filtered data. |
RestaurantId |
OrderStatus |
This schema provides the same benefit of data distribution as the previous global secondary index partitioning approach. Keeping the sort key as the
|
Not recommended because of higher write capacity units consumption and storage cost. | |
OrderStatus |
RestaurantId |
Keeping OrderStatus as the partition key stores data across restaurants with the same order status collocated in a partition. This approach results in hot spots while accessing data for different restaurants simultaneously. Having heavily requested hot spots leads to inefficient use of provisioned read and write capacity, ultimately resulting in throttling. Similar to the previous approach, this partitioning scheme also requires more write capacity units and storage cost because all order statuses are being maintained in the global secondary index. |
Not recommended because of poor data distribution across global secondary index partitions. |
Reduce read capacity unit (RCU) consumption for large items
In some cases, the items in your table might have a large number of attributes but the application query requires reading only a small subset of these attributes. In these cases, querying data from the table consumes more read capacity units because of the large item size. This pattern is useful with read-heavy workloads of a particular type. Such read-heavy workloads occur when the price of querying data from the table is much higher than the price of maintaining only required attributes in a separate global secondary index.
This pattern is particularly useful with count queries. You use such queries to find the total number of sort keys under a partition key. In these queries, you aren’t interested in the data of each item, but only need a count of the total number of items. Maintaining a separate keys-only global secondary index in such cases significantly reduces the per-item size being read.
Let’s say online shopping platform stores product information in a DynamoDB table called Products
. The shopping platform displays these products on its website by querying this table. The partition key of the table is ProductId
. The ProductId
uniquely identifies a product, so the table doesn’t have a sort key. Each item in the table is written once but read a hundred times. This makes it a read-heavy workload.
The following table shows the schema of the DynamoDB table.
The size of each item in the table is 100 KB, but to display the preview of the product on the website, the company needs only a few attributes — ProductName
, ProductDescription
, and Price
. These add up to less than 4 KB. The Products
table already supports accessing the required attributes. However, it consumes several times the read capacity units per item compared to maintaining a global secondary index with only the fields to be displayed on the website. Having a separate global secondary index in this case helps deliver a significant reduction in read costs.
Isolating read workloads
In this usage pattern, you create the global secondary index on the same primary keys as the table. However, the read capacity units are provisioned separately to cater to the specific reader. Use this pattern when the data in your table has multiple access patterns and isolation is required to prevent the access patterns from affecting each other.
One common scenario here includes data shared between high-priority user requests and low-priority offline processing. To avoid low-priority offline requests from throttling high-priority user requests on the table, you can offload offline requests to scan and query the global secondary index instead of the table.
Commons use cases and scenarios
You can find some common use cases and scenarios for global secondary indexes following.
Internet of Things (IoT) applications
AWS customers deploy IoT applications across a range of use cases, including connected homes, healthcare, connected vehicles, industrial machines, and retail. Most of these applications share a common pattern of data ingestion, and they require real-time analytics and monitoring capabilities. By using global secondary indexes, you can monitor IoT applications in real time with minimal management.
As an example, consider an industrial application that runs in a number of buildings, each with hundreds of IoT sensors. Each sensor senses data across various dimensions and updates the data in a DynamoDB table. The table stores the latest value of each metric.
This IoT application needs real-time monitoring and alarming if any of the metrics breaches its configured threshold. To achieve this, create a global secondary index on each of these dimensions and periodically query the index for values greater than the threshold for each metric. You can create a global secondary index with BuildingId
as the partition key for each global secondary index and the metric name as the sort key.
In some cases, the IoT application might need to start measuring other metrics such as Speed
and need an alarm if the machines are running too fast. In these cases, you can add the metric as a table attribute and create a global secondary index on it.
Advertising technology
You can use global secondary indexes in a number of advertising technology use cases such as ad targeting, real-time bidding (RTB), and ad personalization. DynamoDB provides the performance and availability required for these use cases.
Let’s look at an RTB use case to see how you can solve it by using global secondary indexes. RTB requires making optimum decisions to determine the value of each ad before bidding on it. Typically, the time between calculating bids and bidding is 100 milliseconds. You can use DynamoDB to store user profile information that includes the categories to which a specific user belongs as well as the times the user was assigned a given segment.
You can use this user information as part of bidding-decision logic. By using a global secondary index with User Id
as the partition key and Category
as the sort key, you can query user data in real time by the category of the ad.
Gaming
Speed and scalability are crucial aspects of gaming applications. These applications typically require single-digit millisecond reads and writes to avoid latency during game play. You can use global secondary indexes to handle such scenarios.
Global secondary indexes can help to create a highest-score leaderboard, social graphs for players, and in chat messages. The ease of creating and removing a global secondary index when a new query arises allows rapid development and reduces time to market.
Summary
Global secondary indexes enhance the querying capability of DynamoDB. This post shows how you can use global secondary indexes along with patterns such as data filtering and data ordering to achieve read isolation and reduce query costs. The recent limit increase of the maximum number of global secondary indexes per DynamoDB table from 5 to 20 can help you apply these usage patterns without worrying about hitting limits.
If you have questions about how to get the most out of global secondary indexes, comment below or tweet us at @DynamoDB.
About the Author
Shubham Sethi is a software development engineer at Amazon Web Services.