Products ›  Machine Learning  › AWS HealthOmics  › AWS HealthOmics Pricing

 

 

Overview

AWS HealthOmics helps customers accelerate scientific breakthroughs with fully managed bioinformatics and drug discovery infrastructure designed to handle workflows and storage at massive scale. With HealthOmics, you only pay for what you use and there are no HealthOmics licensing costs.

HealthOmics offers two types of workflows. Private workflows are custom user defined workflows that enable you to bring your own bioinformatics scripts written in the most commonly used workflow languages. Pricing for private workflows is based on the compute and file system resources requested for each run. Ready2Run workflows are prebuilt bioinformatics pipelines based on common industry analyses and you pay a fixed cost per run.

HealthOmics offers two types of storage. Reference and sequence stores are data stores for objects that use tiering, compression, and metadata cataloging to enable cost effective storage and organization of bioinformatics data. Pricing is based on the object size stored and the data tier. Variant and annotation store are zero-ETL stores that extract key data from bioinformatics data to create a data lake optimized for searching and cohort creation. Pricing is based on the storage size of the information extracted.

You can use workflows and data stores together or separately, as needed. If you are willing to make a usage commitment for three or five years, please contact us for discounted pricing.

Explore Pricing by Type

With AWS HealthOmics, you only pay for what you use. Explore pricing by types below.

Free Tier

As part of the AWS Free Tier, you can get started with AWS HealthOmics for free. Upon sign-up, new AWS customers receive up to 275 omics.m.xlarge (or equivalent) instance hours and 49,000 gigabyte-hours of run storage for running private workflows, 1,500 gigabase-months of active and archive storage in the sequence store, and 200 gigabyte-months of storage in the variant store. Your usage for the Free Tier is calculated each month across all Regions (except the AWS GovCloud (US) Regions) and is automatically applied to your bill; unused monthly usage will not roll over. Restrictions apply; see terms for more details.

 

Free Tier usage per month for the first 2 months

HealthOmics Workflows

Private Workflows: 275 omics.m.xlarge instance hours or equivalent compute instances and 49,000 GB-hours of run storage

HealthOmics Data Stores Sequence Store: 1500 gigabase-months in active storage class and 1500 gigabase-months in archive storage class

Variant Store: 200 gigabyte-months

AWS customers receive 100GB of data transfer out to the internet free each month, aggregated across all AWS Services and Regions (except China and GovCloud).

Private Workflows Pricing

Private workflows are custom workflows that you define based on your workflow language of choice to run bioinformatics or drug discovery pipelines. There are two components to cost: workflow task instances and run storage.

You are charged for the omics instance used for each task in your workflow. Each task in your workflow is mapped to the smallest available omics instance that satisfies the vCPUs, memory, and/or GPUs requested for the task. For example, a task that is defined to use 8 CPUs and 60 GiB of RAM will map to the omics.r.2xlarge instance type for execution. HealthOmics always provisions exactly the resources requested. In this example, 8 CPUs and 60 GiB of RAM will be available to the task. Tasks are billed in increments of 1 second; However, there is a minimum billing threshold of 60 seconds per task. In the case that you do not specify vCPUs or memory for a task, HealthOmics will automatically provision the smallest available instance type, omics.c.large, for these tasks. You are also not charged for compute associated with data staging (i.e., imports and exports) and there are no cross-AZ charges.

For run storage, you can choose a statically provisioned file system with greater file system throughput or a file system that scales dynamically. Static run storage is available in the following sizes: 1200 GiB, 2400 GiB, and then in increments of 2400 GiB thereafter, with a minimum provisioned size of 1200 GiBs. Dynamic run storage scales with usage and does not have a minimum storage provisioning requirement.

You are only charged for resources while the run is in the running state. No charges are incurred for runs in the pending, starting, or stopping states. For runs that are cancelled or fail, you are billed for any resources that were used up until the point of cancellation or failure.

You can view your total costs for every run on your AWS bill, making it fast and easy to determine your costs. HealthOmics also provides an open source run analyzer tool to help you optimize run resources, costs, and performance. If you plan to run production workflows at scale and are willing to make a three or five year usage commitment, please contact us for discounted pricing.

 

Ready2Run Workflows Pricing

Ready2Run workflows are preconfigured workflows designed by industry leading third-party software companies, like NVIDIA, Sentieon, Element Biosciences, and Ultima, along with common open-source pipelines such as Broad Institute’s GATK workflows and AlphaFold for protein structure prediction. You can simply use Ready2Run workflows to process your data without the need to manage the software tools or workflow scripts. Ready2Run workflows are pay-per-run and you are charged the same flat fee when runs successfully complete, regardless of run time. If the run is cancelled or unable to successfully complete within the first hour, the cost-per-run fee is prorated based on the first hour of usage. Runs that execute for more than 1 hour are billed for the full price of the run. Sentieon Ready2Run workflows require a separate subscription purchased from Sentieon. A free two week evaluation subscription is automatically provided by Sentieon at no additional cost to first time Sentieon Ready2Run users. To view detailed information on available Ready2Run workflows, including input parameters, workflow diagrams, and estimated run times, visit the HealthOmics console.

Data Stores Pricing

The HealthOmics data stores are managed findable, accessible, interoperable, and reusable (FAIR) storage for large scale sample data with automatic data compression and optimized variant/annotation queryability.

The sequence store delivers cost savings through usage-driven tiering and compression. Stored objects are grouped under read sets for organization and findability. When you store data in the sequence store you pay per gigabase per month. A gigabase is one billion bases from your imported sequence files (such as FASTQ, BAM, and CRAM). Since billing is per gigabase, you don’t need to worry about optimal file formats or compression techniques. AWS HealthOmics optimizes this for you. Data in the sequence store can be accessed in two ways: 1/ Through read, write, and update HealthOmics APIs and reading through S3 APIs. For access through HealthOmics APIs, you pay for GET requests made to your read-set objects. All other HealthOmics request types on read sets are no-charge. 2/ Through S3 list and get APIs. For access through S3 APIs, COPY and LIST requests are billed separately from all other request types. To see how HealthOmics Sequence Store costs compare to alternative storage options, see our blog: https://aws.amazon.com/blogs/industries/store-omics-data-cost-effectively-at-any-scale-with-aws-healthomics/

The variant and annotation stores use zero-ETL to prepare variant and annotation data for querying, cohorting, and analysis with AWS services such as Amazon Athena and Amazon SageMaker. Ingested files are processed by HealthOmics and converted into query optimized formats. You can store any amount of variant and annotation data and you only pay for what is stored. The billed data size is defined as the size of the data after ingestion and transformation. Data in the variant and annotation store is accessed typically through other AWS services. When you query and analyze the data in other services, you pay for the use of those services.

Data stored in AWS HealthOmics data stores is charged for a minimum storage duration of 30 days, and data deleted before 30 days incurs a prorated charge equal to the storage charge for the remaining days. 

Pricing Examples

  • A bioinformatics scientist wants to run a Nextflow workflow in AWS HealthOmics workflows in the US East (N. Virginia) Region. She has three tasks in the workflow. The first reserves 16 vCPUs and 30 GB memory and takes 3 hours to run. The second requires 32 vCPUs and 160 GB memory and takes 2 hours to run. The third reserves 4 vCPU and 10 GB memory and takes 10 minutes to run. The customer registers the workflow and calls the StartRun API with the default 1200 GB file system. Her overall costs are:
    Task 1 (omics.c.4xlarge): $ 0.9180/hr * 3 hrs = $2.754
    Task 2 (omics.r.8xlarge): $ 2.7216/hr * 2 hrs = $5.4432
    Task 3 (omics.m.xlarge): $ 0.2592/hr * 1/6 hrs = $0.0432
    Static run storage: $0.0001918/ GB-hour * (1200GB*(3 hr+2 hr+1/6 hr)) = $1.18916
    Total: $9.42956

  • A bioinformatics scientist is developing a new WDL workflow in AWS HealthOmics in the US East (N. Virginia) Region. She has two tasks in the workflow. The first reserves 16 vCPUs and 30 GB memory and takes 3.5 hours to run. The second requires 32 vCPUs and 160 GB memory and takes 2.25 hours to run. The customer registers the workflow and calls the StartRun API with the dynamic file system. Over the course of the 5.75 hour workflow run, the file system grows linearly from 0GB to 1043GB, totaling 3000 GB-hr of file storage. Her overall costs are:
    Task 1 (omics.c.4xlarge): $ 0.9180/hr * 3.5 hrs = $3.213
    Task 2 (omics.r.8xlarge): $ 2.7216/hr * 2.25 hrs = $6.1236
    Dynamic run storage: $0.0004110/ GB-hr * 3,000 GB-hr = $1.233
    Total: $10.5696

  • A computational scientist wants to run the GATK-BP Germline fq2vcf for 30x genome Ready2Run workflow in the US East (N. Virginia) Region for 3 samples. The customer input their data and calls the StartRun API for each sample. The cost for the 3 runs is:
    GATK-BP Germline fq2vcf for 30x genome Ready2Run workflow: $ 10.00/run * 3 = $30.00
    Total: $30.00

  • A population sequencing initiative is starting to sequence individuals from a biobank they have collected. They choose to do this in the EU West (Ireland) Region. They sequence 100,000 individuals, each at 130 gigabases, 50 gigabytes, and store the raw sequencing data in AWS HealthOmics storage. Over the next five years, they remain in the archive storage class after the 30 days following import and are accessed twice, on average, when they transition to the active storage class for 30 days. They use S3 APIs for accessing the files. Each genome is downloaded in 500 parts, generating 500 GET API calls. Their total cost over five years for a single genome is:
    Active storage class: $0.005769 gigabase/month * 130 gigabases * 90 days = $2.22
    Archive storage class: $0.001154 gigabase/month * 130 gigabases * (1825 – 90) days = $8.56.
    S3 GET APIs: $0.0004 / 1000 API calls * (2 * 500API calls) = $0.0004
    Total cost for 5 years: $2.22 + $8.56 + $0.0004 = $10.78 (or $2.15/year)

  • A data scientist has 3,202 variant call format (VCF) files that he wants to analyze in Amazon Athena in the US East (N Virginia) Region. He creates a variant store and ingests these files using the AWS HealthOmics APIs. The ingested data is 1.5 TB in size. Over the course of the next month, he executes 1,000 queries in Athena, calculating allele frequencies for different subpopulations, each on average consuming 50 GB. His overall monthly costs are:
    Variant store: $0.035 GB/month * (1024 GB/TB * 1.5 TB) = $53.76
    Amazon Athena: $5 / TB * 1000 * 50 / 1024 = $244.14

Data Transfer Pricing

You pay for all bandwidth out of HealthOmics. Data transfer fees do not apply to data transferred to any AWS services within the same AWS Region as the data store. The pricing below is based on data transferred "in" and "out" of AWS HealthOmics (over the public internet)†††. Learn more about AWS Direct Connect pricing. For Data Transfers exceeding 500 TB/Month, please contact us.

Rate tiers take into account your aggregate usage for Data Transfer Out to the Internet across all AWS services.

††† Data Transfer Out may be different from the data received by your application in case the connection is prematurely terminated by you, for example, if you make a request for a 10 GB object and terminate the connection after receiving the first 2 GB of data. AWS HealthOmics attempts to stop the streaming of data, but it does not happen instantaneously. In this example, the Data Transfer Out may be 3 GB (1 GB more than 2 GB you received). As a result, you will be billed for 3 GB of Data Transfer Out.