AWS Storage Blog

Category: Analytics

Amazon S3 featured image - new

Consolidate and query Amazon S3 Inventory reports for Region-wide object-level visibility

Organizations around the world store billions of objects and files representing terabytes to petabytes of data. Data is often owned by different teams, departments, or business units, spanning multiple locations. As the amount of datastores, locations, and owners grow, you need a way to cost-effectively maintain visibility on important characteristics of your data, including based […]

Amazon S3 Archive Storage Classes

Identify cold objects for archiving to Amazon S3 Glacier storage classes

Update (02/13/2024): Consider Amazon S3 Lifecycle transition fees that are charged based on the total number of objects being transitioned, the destination storage class (listed on the Amazon S3 pricing page), as well as the additional metadata charges applied. You can use the S3 pricing calculator to estimate the total upfront and monthly costs by […]

AWS DataSync Featured Image 2020

Derive insights from AWS DataSync task reports using AWS Glue, Amazon Athena, and Amazon QuickSight

Update (9/22/2023): Step 6b updated to automatically detect and update the Amazon Athena table schema when crawler detects large data transfer values reported in bytes that would consume the table’s maximum integer value while storing data. As customers scale their migration of large datasets with millions of files across multiple data transfers, they are faced […]

AWS DataSync Featured Image 2020

Migrate on-premises data to AWS for insightful visualizations

When migrating data from on premises, customers seek a data store that is scalable, durable, and cost effective. Equally as important, BI must support modern, interactive, and fast dashboards that can scale to tens of thousands of users seamlessly while providing the ability to create meaningful data visualizations for analysis. Visualization of on-premises business analytics […]

S3 Security

Disabling ACLs for existing Amazon S3 workloads with information in S3 server access logs and AWS CloudTrail

Access control lists (ACLs) are permission sets that define user access, and the operations users can take on specific resources. Amazon S3 was launched in 2006 with ACLs as its first authorization mechanism. Since 2011, Amazon S3 has also supported AWS Identity and Access Management (IAM) policies for managing access to S3 buckets, and recommends using […]

Maximizing price performance for big data workloads using Amazon EBS

Since the emergence of big data over a decade ago, Hadoop ­– an open-source framework that is used to efficiently store and process large datasets – has been crucial in storing, analyzing, and reducing that data to provide value for enterprises. Hadoop lets you store structured, partially structured, or unstructured data of any kind across […]

Simplify and scale access management to shared datasets with cross-account Amazon S3 Access Points

In today’s interconnected and data centric world, businesses must have access to the right data for data-driven decision-making, ultimately driving better business results. Collecting all the relevant data takes time and capital as it requires setting up data ingestion pipelines, hiring analysts to validate and interpret the data, and incorporating data insights that influence important […]

Isima.io optimizes price performance for OLAP workloads using Amazon EBS

Isima.io, a unified analytics startup founded in 2016, aims to accelerate analytics outcomes for organizations. Isimia.io does this by combining multiple data management disciplines – including Enterprise Service Bus (ESB), Extract-Transform-Load (ETL), Enterprise-Data-Warehouse (EDW), and Business Intelligence (BI) – into one hyper-converged system. IT teams can only win by building differentiated, agile data apps. The […]

Simplify archiving Amazon EBS Snapshots and monitor progress using a live Amazon CloudWatch dashboard

Data protection is top of mind for our customers, and having a data backup strategy is critical to ensure compliance, disaster recovery readiness, and business continuity. As customers experience exponential business growth, their data storage needs grow as well, and data retention can become very costly. In order to meet compliance requirements for data retention […]

Amazon S3 featured image - new

Run queries up to 9x faster using Trino with Amazon S3 Select on Amazon EMR

Customers building data lakes continue to innovate in the ways that they store and access their data. For these customers, performance is critical, particularly when they are accessing large amounts of data. For example, data scientists, data analysts, and data engineers running queries from open source frameworks like Trino want to accelerate access to their […]