AWS Storage Blog
Analyzing Amazon S3 Metadata with Amazon Athena and Amazon QuickSight
Object storage provides virtually unlimited scalability, but managing billions, or even trillions, of objects can pose significant challenges. How do you know what data you have? How can you find the right datasets at the right time? By implementing a robust metadata management strategy, you can answer these questions, gain better control over massive data […]
Build a managed transactional data lake with Amazon S3 Tables
UPDATE (12/19/2024): Added guidance for Amazon EMR setup. Customers commonly use Apache Iceberg today to manage ever-growing volumes of data. Apache Iceberg’s relational database transaction capabilities (ACID transactions) help customers deal with frequent updates, deletions, and the need for transactional consistency across datasets. However, getting the most out of Apache Iceberg tables and running it […]
Uncover new performance insights using Amazon EBS detailed performance statistics
As businesses increasingly rely on latency-sensitive applications for mission-critical workloads, the need to understand performance across the entire technology stack is essential to swiftly resolve performance bottlenecks that could affect application efficiency. Given that storage performance and stability directly impact application efficiency, reliability, scalability, and user experience, it is paramount for organizations to have the […]
How Amazon S3 Tables use compaction to improve query performance by up to 3 times
Today businesses managing petabytes of data must optimize storage and processing to drive timely insights while being cost-effective. Customers often choose Apache Parquet for improved storage and query performance. Additionally, customers use Apache Iceberg to organize Parquet datasets to take advantage of its database-like features such as schema evolution, time travel, and ACID transactions. Customers […]
Manage costs for replicated delete markers in a disaster recovery setup on Amazon S3
Many businesses recognize the critical importance of safeguarding their essential data from potential disasters such as fires, floods, or ransomware events. Designing an effective disaster recovery (DR) strategy includes thoughtfully evaluating and selecting cost-effective solutions that fulfill compliance requirements. By using Amazon S3 features such as S3 object tags, S3 Versioning, and S3 Lifecycle, you can […]
Migrating data access and Microsoft Active Directory with Amazon FSx for NetApp ONTAP
In today’s digital era, enterprises face significant challenges in data center modernization during their digital transformation journey. Traditional on-premises solutions struggle with high costs, complex management, and data growth. Organizations with intricate file-sharing systems and user permissions face difficulties in preserving user experiences and security. The tight integration of enterprise IDCs with complex Microsoft Active […]
Fundrise uses Amazon S3 Express One Zone to accelerate investment data processing
Fundrise is a financial technology company that brings alternative investments directly to individual investors. With more than 2 million users, Fundrise is one of the leading platforms of its kind in the United States. The challenge of providing a smooth, secure, and transparent experience for millions of users is largely unprecedented in the alternative investment […]
How Amazon Ads uses Iceberg optimizations to accelerate their Spark workload on Amazon S3
In today’s data-driven business landscape, organizations are increasingly relying on massive data lakes to store, process, and analyze vast amounts of information. However, as these data repositories grow to petabyte scale, a key challenge for businesses is implementing transactional capabilities on their data lakes efficiently. The sheer volume of data requires immense computational power and […]
Enhance resource selection in AWS Backup Policies in AWS Organizations
In today’s digital landscape, businesses rely on consistent and secure backups for data protection and disaster recovery (DR). A centralized backup policy enables organizations to enforce uniform data protection standards across departments and workloads, helping to maintain compliance and minimize risks. In the cloud, organizations use backup policies to manage data protection from a central […]
AWS Snow device updates
Since its launch in 2015, customers have used AWS Snow devices to move data to the AWS Cloud or run compute and processing workloads at the edge. Our innovations since have made moving data to AWS and running workloads at the edge, faster, more efficient, and more cost effective. During the same time, network bandwidth […]