AWS Storage Blog

Category: Advanced (300)

AWS DataSync Featured Image 2020

How to move and store your genomics sequencing data with AWS DataSync

Genomics data is expanding at a rate exceeding Moore’s law according to the National Human Genome Research Institute. As more sequencing data is produced and researchers move from genotyping to whole genome sequencing, the amount of data produced is outpacing on-premises capacity. Organizations need cloud solutions that help manage data movement, storage, and analysis. The […]

S3 cost optimization

Optimize storage costs by analyzing API operations on Amazon S3

The demand for data storage has increased with the advent of a fast-paced data environment – creating, sharing, and replicating data at a large scale. Most organizations are looking for the optimal way to store their data cost-effectively, giving them everything they need from their data but without breaking the bank. Cloud storage provides flexible […]

Amazon EFS serverless featured image

Analytical processing of millions of cell images using Amazon EFS and Amazon S3

Analytical workloads such as batch processing, high performance computing, or machine learning inference often have high IOPS and low latency requirements but operate at irregular intervals on subsets of large datasets. Typically, data is manually copied between storage tiers in preparation of processing, which can be cumbersome and error-prone. Given this, IT teams want to […]

Synchronize your Oracle databases quickly and easily with Amazon FSx for OpenZFS

Update 4/8/2024: You can find a more recent version of this solution in the blog “Accelerate development refresh cycles and optimize cost with Amazon FSx for NetApp ONTAP.” That post presents an alternative solution to the same use case found in this post, but using Amazon FSx for NetApp ONTAP. This post, which uses Amazon […]

Amazon S3

Allowing external users to securely and directly upload files to Amazon S3

Organizations are often required to store files, images, and other digital assets in a repository. In many cases, the source of these files are partners or individuals who are not connected to internal systems and requires corporate authentication in order to upload the files. Customers traditionally use servers to handle file uploads, which can use […]

AWS DataSync Featured Image 2020

Using AWS DataSync to move data from Hadoop to Amazon S3

You want to leverage cloud scalability, increase cost efficiency by paying only for utilized storage, decouple big data storage from processing, and increase capabilities for data analytics and machine learning using AWS. But how do you move your Hadoop cluster? To accelerate this transition, AWS DataSync recently launched support for moving data between Hadoop Distributed […]

Amazon EBS

Restoring on-premises applications to AWS from Amazon EBS Snapshots created by EBS direct APIs

Incremental, point-in-time copies of data can be a secure and cost effective tool anchoring disaster recovery, data migration, and compliance solutions. Amazon EBS Snapshots are how EBS customers leverage point-in-time copies of their data stored on AWS, and you can use Snapshots on premises too. In December 2019, AWS introduced Amazon EBS direct APIs, providing […]

AWS DataSync Featured Image 2020

Simplify data migrations using an AWS DataSync agent on Linux KVM Hypervisor

UPDATE (1/19/2023): Some readers who followed the steps in this blog post to deploy an AWS DataSync agent on the KVM platform ran into issues, either because the hypervisor host does not support virtualization or it is not enabled on the platform. Therefore, I have added the steps to verify whether the hypervisor host supports […]

Running WordPress on Amazon EKS with Amazon EFS Intelligent-tiering

A large percentage of websites today rely on Content Management Systems (CMS) which provide content creators, who may have little to no experience in web development, with the ability to easily publish their content to a website for distribution to their end users. By far, the most popular CMS platform today is WordPress. More developers […]

Amazon S3

Monitor Amazon S3 activity using S3 server access logs and Pandas in Python

Monitoring and controlling access to data is often essential for security, cost optimization, and compliance. For these reasons, customers want to know what data of theirs is being accessed, when it is being accessed, and who is accessing it. With more data to monitor, large amounts of data can make it more challenging to granularly […]