AWS Public Sector Blog

Tag: genomics

Downscaled CMIP5, 1950 US Census, and open genomics data for Galaxy: The latest open data on AWS

The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). Our full list of publicly available datasets are on the Registry of Open Data on AWS. This quarter, we released 13 new or updated datasets including CMIP5, 1950s US Decennial Census, and open genomics data for Galaxy. Read on for some highlights.

Preventing the next pandemic: How researchers analyze millions of genomic datasets with AWS

How do we avoid the next global pandemic? For researchers collaborating with the University of British Columbia Cloud Innovation Center (UBC CIC), the answer to that question lies in a massive library of genetic sequencing data. But there is a problem: the data library is so massive that traditional computing can’t comprehensively analyze or process it. So the UBC CIC team collaborated with computational virologists to create Serratus, an open-science viral discovery platform to transform the field of genomics—built on the massive computational power of the Amazon Web Services (AWS) Cloud.

Solving medical mysteries in the AWS Cloud: Medical data-sharing innovation through the Undiagnosed Diseases Network

It takes a medical village to discover and diagnose rare diseases. The National Institutes of Health’s Undiagnosed Diseases Network (UDN) is made up of a coordinating center, 12 clinical sites, a model organism screening center, a metabolomics core, a sequencing core, and a biorepository. For many years prior to the UDN, the experts at these sites were limited by antiquated data-sharing procedures. The UDN leadership realized that if they wanted to scale up and serve as many patients as possible, they needed to transform how they process, store, and share medical data—which led the UDN to the AWS Cloud.

How to set up Galaxy for research on AWS using Amazon Lightsail

Galaxy is a scientific workflow, data integration, and digital preservation platform that aims to make computational biology accessible to research scientists that do not have computer programming or systems administration experience. Although it was initially developed for genomics research, it is largely domain agnostic and is now used as a general bioinformatics workflow management system, running on everything from academic mainframes to personal computers. But researchers and organizations may worry about capacity and the accessibility of compute power for those with limited or restrictive budgets. In this blog post, we explain how to implement Galaxy on the cloud at a predictable cost within your research or grant budget with Amazon Lightsail.

Top announcements from the AWS Public Sector Partners leadership session at re:Invent 2021

During the 10th anniversary of re:Invent, I was thrilled to share announcements and achievements from AWS Partners and programs for the public sector around the world. Since its launch, AWS’s Public Sector Partner Program participation has increased by an average of 54% year over year, with partners providing solutions in mission areas across healthcare, space, energy, transportation, government, education, and nonprofit. In both the Global Partners Summit keynote at re:Invent 2021, as well as in my public sector leadership session, I highlighted the new and upcoming AWS Partner solutions and accomplishments.

Cloud powers faster, greener, and more collaborative research, according to new IDC report

According to a new IDC report, the cloud is helping researchers conduct research faster than ever before by reducing data analysis and processing times, and is allowing researchers around the world to collaborate on solving universal problems. In addition to the positive impact on research, IDC also forecasts that continued adoption of cloud computing globally could prevent environmental emission of more than 1 billion metric tons of CO2 from 2021 through 2024, almost equivalent to removing the 2020 CO2 emissions of Germany and the U.K. combined.

koala in tree

Climate data, koala genomes, analysis ready radar data, and highly-queryable genomic data: The latest open data on AWS

The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on AWS. We work with data providers to democratize access to data by making it available to the public for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Our full list of publicly available datasets are on the Registry of Open Data on AWS. This quarter, we released 26 new or updated datasets including datasets on climate, koala genomes, analysis ready radar data, and highly-queryable genomic data. Check out some highlights.

Driving innovation in single-cell analysis on AWS

Computational biology is undergoing a revolution. However, the analysis of single cells is a hard problem to solve. Standard statistical techniques used in genomic analysis fail to capture the complexity present in single-cell datasets. Open Problems in Single-Cell Analysis is a community-driven effort using AWS to drive the development of novel methods that leverage the power of single-cell data.

human genome

Accelerating genome assembly with AWS Graviton2

One of the biggest scientific achievements of the twenty-first century was the completion of the Human Genome Project and the publication of a draft human genome. The project took over 13 years to complete and remains one of the largest private-public international collaborations ever. Advances since in sequencing technologies, computational hardware, and novel algorithms reduced the time it takes to produce a human genome assembly to only a few days, at a fraction of the cost. This made using the human genome draft for precision and personalized medicine more achievable. In this blog, we demonstrate how to do a genome assembly in the cloud in a cost-efficient manner using ARM-based AWS Graviton2 instances.

A generalized approach to benchmarking genomics workloads in the cloud: Running the BWA read aligner on Graviton2

The AWS Cloud gives genomics researchers access to a wide variety of instance types and chip architectures and this elasticity allows us to rethink genomics workflows when running workloads in the cloud. Given the increased performance of the Graviton2 instances, we wanted to explore if they can be used for cost-effective and performant genomics workloads. Read on to learn about our generalized approach for determining the most effective instance type for running genomics workloads in the cloud.