AWS Storage Blog
Tag: Amazon Athena
Analyzing Amazon S3 Metadata with Amazon Athena and Amazon QuickSight
Object storage provides virtually unlimited scalability, but managing billions, or even trillions, of objects can pose significant challenges. How do you know what data you have? How can you find the right datasets at the right time? By implementing a robust metadata management strategy, you can answer these questions, gain better control over massive data […]
How Delhivery migrated 500 TB of data across AWS Regions using Amazon S3 Replication
Delhivery is one of the largest third-party logistics providers in India. It fulfills millions of packages every day, servicing over 18,000 pin codes in India and powered by more than 20 automated sort centers, 90 warehouses, with over 2800 delivery centers. Data is at the core of the Delhivery’s business. In anticipating of potential regulatory […]
Derive insights from AWS DataSync task reports using AWS Glue, Amazon Athena, and Amazon QuickSight
Update (10/30/2024): On October 30, 2024, AWS DataSync launched Enhanced mode tasks, prompting updates to this blog. Updates include a new step in the “Step 2: Populate Glue catalog with task reports data using a Glue crawler” section and detailed information on the new capabilities in “Updated steps for working with task reports of new […]
Access a point in time with Amazon S3 Object Lambda
Point-in-time ‘snapshots’ enable administrators, developers, testers, and end users to quickly access a storage volume or share how it was at an earlier point-in-time. They are a longstanding approach to data protection and recovery, tracking changes within a storage system to reduce both Recovery Point Objective (RTO) and Recovery Time Objective (RTO). However, traditional snapshots […]
How to optimize querying your data in Amazon S3
After careful consideration, we have made the decision to close new customer access to Amazon S3 Select and Amazon S3 Glacier Select, effective July 25, 2024. Amazon S3 Select and Amazon S3 Glacier Select existing customers can continue to use the service as usual. AWS continues to invest in security and availability improvements for Amazon […]
Use generative AI to query your Amazon S3 data lake for insights
Businesses store large volumes of data in their data lakes and rely on this data to extract insights and make important business decisions. However, business stakeholders sometimes lack the technical skills required to run complex queries against their data lakes. Instead, they rely on data scientists or analysts to build reports and dashboards or to […]
Streamline and automate compliance monitoring and reporting with AWS Backup Audit Manager
Organizations meet business and regulatory requirements by having visibility and control over backup environments. You want a streamlined solution to continuously monitor, detect, and track policy drifts across your backup deployments at scale. This need is driven by the growing complexity of AWS environments, the proliferation of data across diverse AWS services and regions, and […]
Maintaining object immutability by automatically extending Amazon S3 Object Lock retention periods
Protecting against accidental or malicious deletion is a key element of data protection. Immutability protects data in-place, preventing unintended changes or deletions. However, sometimes it isn’t clear for how long data should be made immutable. Users in this situation are looking for a solution that maintains short-term immutability, indefinitely. They want to make sure their […]
Understand Amazon S3 data transfer costs by classifying requests with Amazon Athena
Cost is top of mind for many enterprises, and building awareness of different cost contributors is the first step toward managing costs and improving efficiency. Costs for transferring data may segregate into common but low cost and less frequent but higher cost groups. Data about these two groups is mixed together, and separating them enables […]
Managing duplicate objects in Amazon S3
When managing a large volume of data in a storage system, it is common for data duplication to happen. Data duplication in data management refers to the presence of multiple copies of the same data within your system, leading to additional storage usage as well as extra overhead when handling multiple copies of the same […]