AWS Big Data Blog
Amazon EMR 6.2.0 adds persistent HFile tracking to improve performance with HBase on Amazon S3
Apache HBase is an open-source, NoSQL database that you can use to achieve low latency random access to billions of rows. Starting with Amazon EMR 5.2.0, you can enable HBase on Amazon Simple Storage Service (Amazon S3). With HBase on Amazon S3, the HBase data files (HFiles) are written to Amazon S3, enabling data lake architecture benefits such as the ability to scale storage and compute requirements separately. Amazon S3 also provides higher durability and availability than the default HDFS storage layer. When using Amazon EMR 5.7.0 or later, you can set up a read replica cluster to achieve high-availability, multi-Availability Zone reads across multiple HBase clusters without duplicating your data. Amazon EMR continues to be committed to providing the best HBase on Amazon S3 experience by focusing on improving performance, availability, and reliability.
With Amazon EMR 6.2.0 (HBase 2.2.6), we have added persistent HFile tracking within HBase on Amazon S3 to improve latency for critical write path operations (HFile flush and compaction). With persistent HFile tracking, we see an increase of up to 7% in overall performance for write-heavy YCSB workloads and an average 37% increase for workloads that are constrained by HFile flush operations. The feature is enabled by default and doesn’t require any manual setup or migration.
This post details the improvements with persistent HFile tracking, the performance benefits that can be achieved, and operational considerations for utilizing persistent HFile tracking within Amazon EMR.
Persistent HFile tracking
Persistent HFile tracking is intended to improve performance of write-heavy HBase on Amazon S3 workloads. Because Amazon S3 is an object store, it doesn’t have native support for some core file system operations such as rename. Applications built on Amazon S3 implement the rename operation as an object COPY followed by an object DELETE. As a result, the rename is no longer a fast and atomic operation. HBase relies heavily on using rename operations as a commit mechanism. With persistent HFile tracking, we wanted to reduce the reliance on rename operations when writing data in HBase.
Persistent HFile tracking utilizes a new HBase system table called hbase:storefile
to directly track HFile paths. New HFile paths are committed to the table as additional data is written to HBase. These paths are subsequently used during read operations. This removes rename operations as a commit mechanism in critical write path HBase operations, and reduces the number of Amazon S3 calls required when opening an HBase region since the HFile paths can be read from the new system table instead of requiring file system directory listing.
The following figure highlights the commit process for the flush and compaction operations before and after the persistent HFile tracking feature is enabled.
Persistent HFile tracking introduces two key differences:
- The data is written directly to the final destination, removing the move from the temporary location to the final location
- The current list of committed HFiles (in the Store File Manager) is persisted to the new hbase:storefile table
Removing the rename as a commit mechanism enables write-heavy workloads on HBase on Amazon S3 by improving the write throughput that can be achieved. When HBase is under a heavy write load, the throughput is bounded by the flush and compaction operations. With persistent HFile tracking, the performance of these critical operations is improved, allowing the overall throughput to increase. In the next section, we analyze the performance of the persistent HFile tracking feature.
Performance results
In this section, we walk through the persistent HFile tracking performance results for the YCSB load workload. YCSB is one of the most popular benchmarks for performance analysis with HBase. We used 10 parallel YCSB clients to simulate a write-heavy workload.
Experiment setup
For our EMR cluster, we used i3en.2xlarge for our head node and five i3en.2xlarge worker nodes.
The following table summarizes the configuration key-values.
Configuration Property Key | Configuration Value |
hbase.emr.storageMode | s3 |
hbase.rs.cacheblocksonwrite | true |
hfile.block.index.cacheonwrite | true |
hbase.rs.prefetchblocksonopen | true |
hbase.rs.cachecompactedblocksonwrite | true |
hfile.block.bloom.cacheonwrite | true |
hbase.hregion.memstore.flush.size | 402653184 |
hbase.bucketcache.size | 2097152 |
hbase.regionserver.thread.compaction.large | 2 |
hbase.regionserver.thread.compaction.small | 2 |
hbase.hstore.flusher.count | 4 |
hbase.regionserver.thread.split | 2 |
Our YCSB benchmark had the following configuration:
- YCSB workload A and action
load
(100% INSERT) inserting 1 billion records - 10 separate YCSB clients with 64 threads
Overall YCSB results
With persistent HFile tracking, we saw an average of 4% and up to 7% improvement in runtime for the YCSB load workload. This corresponds to an equivalent improvement in average throughput achieved during the workload. The following graph shows that the minimum throughput achieved with the persistent HFile tracking feature was comparable to the maximum throughput achieved without the feature enabled, showing a net positive improvement.
In the following sections, we break down the performance of the critical compaction and flush operations.
Compaction operations
Compaction refers to the process of merging multiple immutable HFiles into a single new HFile to reduce duplicated data being stored on disk and reduce the number of HFiles that a read operation is required to open to find the requested data.
The following graph shows that compaction operations when isolated were 4% faster on average with the persistent HFile tracking. Because this closely mirrors what we saw in the overall runtime improvement, it indicates that our workload was constrained by compaction operations.
Flush operations
The flush operation refers to the process of writing the data that HBase has buffered in memory to HFiles.
The following graph shows that when flush operations were isolated, persistent HFile tracking performed 37% faster on average. Flush operations performance depends primarily on the performance of file system operations, so the removal of the rename operation in the commit mechanism had a higher impact. Flush operations in HBase are typically asynchronous operations that don’t block new PUTs, which decreases the impact of flush on total throughput. However, if a write workload is constrained by flush operations, the total throughput improvement seen when utilizing persistent HFile tracking increases.
Using persistent HFile tracking and operational considerations
The persistent HFile tracking feature is enabled by default starting from Amazon EMR 6.2.0, and doesn’t require any manual migration steps to utilize it. To use this feature, simply upgrade to the Amazon EMR 6.2.0 or a later release.
Additionally, with the launch of Amazon S3 strong consistency, you no longer need to enable EMRFS consistent view when launching an HBase on Amazon S3 cluster.
Persistent HFile tracking does not require any manual operation. However, for operational considerations, see Persistent HFile Tracking.
Summary
Persistent HFile tracking simplifies the commit process on HBase on Amazon S3 to deliver up to a 7% improvement for write-heavy workloads and a 37% average increase for flush-heavy workloads. If you use HBase with data in Amazon S3, we recommend trying out Amazon EMR 6.2.0 or a later release to improve the write performance of your HBase applications.
About the Authors
Abhishek Khanna is a Software Engineer at AWS. He likes working on hard engineering problems in distributed systems at massive scale.
Stephen Wu is a software development engineer for EMR at Amazon Web Services. He is an Apache HBase committer and is interested in performance optimization for distributed systems.
Zach York is a Software Engineer for EMR at Amazon Web Services and is an Apache HBase committer and PMC member. He is interested in building highly available distributed systems.
Peng Zhang is a Software Engineer at Amazon Web Services. Peng has diverse background in computer science. He is passionate about serverless and edge computing.