AWS Storage Blog
Category: Analytics
Analyzing Amazon S3 Metadata with Amazon Athena and Amazon QuickSight
Object storage provides virtually unlimited scalability, but managing billions, or even trillions, of objects can pose significant challenges. How do you know what data you have? How can you find the right datasets at the right time? By implementing a robust metadata management strategy, you can answer these questions, gain better control over massive data […]
Build a managed transactional data lake with Amazon S3 Tables
Customers commonly use Apache Iceberg today to manage ever-growing volumes of data. Apache Iceberg’s relational database transaction capabilities (ACID transactions) help customers deal with frequent updates, deletions, and the need for transactional consistency across datasets. However, getting the most out of Apache Iceberg tables and running it efficiently at scale requires some maintenance and management […]
How Amazon Ads uses Iceberg optimizations to accelerate their Spark workload on Amazon S3
In today’s data-driven business landscape, organizations are increasingly relying on massive data lakes to store, process, and analyze vast amounts of information. However, as these data repositories grow to petabyte scale, a key challenge for businesses is implementing transactional capabilities on their data lakes efficiently. The sheer volume of data requires immense computational power and […]
How Delhivery migrated 500 TB of data across AWS Regions using Amazon S3 Replication
Delhivery is one of the largest third-party logistics providers in India. It fulfills millions of packages every day, servicing over 18,000 pin codes in India and powered by more than 20 automated sort centers, 90 warehouses, with over 2800 delivery centers. Data is at the core of the Delhivery’s business. In anticipating of potential regulatory […]
Derive insights from AWS DataSync task reports using AWS Glue, Amazon Athena, and Amazon QuickSight
Update (10/30/2024): On October 30, 2024, AWS DataSync launched Enhanced mode tasks, prompting updates to this blog. Updates include a new step in the “Step 2: Populate Glue catalog with task reports data using a Glue crawler” section and detailed information on the new capabilities in “Updated steps for working with task reports of new […]
Access a point in time with Amazon S3 Object Lambda
Point-in-time ‘snapshots’ enable administrators, developers, testers, and end users to quickly access a storage volume or share how it was at an earlier point-in-time. They are a longstanding approach to data protection and recovery, tracking changes within a storage system to reduce both Recovery Point Objective (RTO) and Recovery Time Objective (RTO). However, traditional snapshots […]
How Vivian Health is using Amazon S3 Express One Zone to accelerate healthcare hiring
Vivian Health connects travel nurses with job opportunities across the country. To do that, the platform has innovated not just the job search itself, but also the tooling used by recruiters and hiring managers to get qualified candidates matched to the right job and placed as quickly and as seamlessly as possible. However, the process […]
Use generative AI to query your Amazon S3 data lake for insights
Businesses store large volumes of data in their data lakes and rely on this data to extract insights and make important business decisions. However, business stakeholders sometimes lack the technical skills required to run complex queries against their data lakes. Instead, they rely on data scientists or analysts to build reports and dashboards or to […]
Streamline and automate compliance monitoring and reporting with AWS Backup Audit Manager
Organizations meet business and regulatory requirements by having visibility and control over backup environments. You want a streamlined solution to continuously monitor, detect, and track policy drifts across your backup deployments at scale. This need is driven by the growing complexity of AWS environments, the proliferation of data across diverse AWS services and regions, and […]
Siemens builds Datalake2Go on AWS to analyze disparate data globally
Siemens is a technology company focused on industry, infrastructure, transport, and healthcare. From resource-efficient factories, resilient supply chains, and smart buildings and grids, to cleaner and more comfortable transportation and advanced healthcare, the company creates technology with purpose, adding real value for its customers. Siemens technology is everywhere, supporting the critical infrastructure and vital industries […]