AWS Public Sector Blog
34 new or updated datasets on the Registry of Open Data: New data for land use, Alzheimer’s Disease, and more
The AWS Open Data Sponsorship Program makes high-value, cloud-optimized datasets publicly available on Amazon Web Services (AWS). AWS works with data providers to democratize access to data by making it available to the public for analysis on AWS; develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and encourage the development of communities that benefit from access to shared datasets. Through this program, customers are making over 100PB of high-value, cloud-optimized data available for public use.
The full list of publicly available datasets are on the Registry of Open Data on AWS and are now also discoverable on AWS Data Exchange. This quarter, AWS released 34 new or updated datasets from Impact Observatory, The Allen Institute for Brain Science, Common Screens, and others, which are available now on the Registry of Open Data.
10m Annual Land Use Land Cover (9-class)
10m Annual Land Use Land Cover (9-class) is a global map of land use/land cover (LULC) derived from the European Space Agency’s (ESA) Sentinel-2 satellite imagery at 10m resolution for the years 2017-2021. Each map provides an annual classification of built area, crops, trees, water, rangeland, flooded vegetation, snow/ice, and bare ground by applying a deep learning artificial intelligence (AI) land classification model to over 400,000 Sentinel-2 satellite images of Earth per year. LULC datasets like this provide users with the ability to measure changes over time and inform critical decision makers in governments, non-government organizations (NGOs), finance, and industry who need trustworthy, actionable information about the changing world.
The Seattle Alzheimer’s Disease (SEA-AD) Study
Understanding neurodegenerative disease requires data that describes patients on several planes: clinical assessments, imaging, and molecular attributes all contribute. The Seattle Alzheimer’s Disease Study is a rich multimodal dataset for 85 Alzheimer’s Disease patients that provides unprecedented insights into the disease process. Available data includes digital neuropathology, single cell transcriptomic data, and single cell chromatin accessibility, as well as basic clinical and demographic data.
Common Screens
The Common Screens project is an expanding corpus of over 55 million screenshots from over 70 million websites on the Internet. Website screenshots allow for machine learning (ML) applications such as classification to identify malicious websites, parked domains, specific kinds of content, and design themes, among other applications. Along with screenshots, the project includes English language Optical Character Recognition (OCR) text, and a collection of ML models.
Here is a full list of the datasets released or significantly updated this quarter joining over 350 datasets already available:
Agriculture:
- Cropland Extent Map (2019) from Digital Earth Africa
Astronomy:
- Europa Controlled Observations from National Aeronautics and Space Administration (NASA)
- Europa Controlled Observation Mosaics from NASA
- SOHO/LASCO2 comet challenge from NASA
Climate and weather:
- A Global Drought and Flood Catalogue from 1950 to 2016 from National University of Singapore
- Open Energy Data Initiative Data Lake from US Department of Energy
- SiPeCaM covering Mexican protected areas from the National Commission for the Knowledge and Use of Biodiversity (CONABIO)
- NOAA 3-D Surge and Tide Operational Forecast System for the Atlantic Basin (STOFS-3D-Atlantic) from the US National Oceanic and Atmospheric Administration (NOAA)
- NOAA – hourly position, current, and sea surface temperature from drifters from NOAA
- CHIRPS Rainfall from Digital Earth Africa
- Updated: NASA Earth Exchange Global Daily Downscaled Projections managed by NASA—now includes cloud-optimized GeoTIFF format
Internet and networking:
- Common Screens from Common Screens
- The MIT Supercloud Dataset from the Massachusetts Institute of Technology (MIT)
Geospatial:
- Coastlines from Digital Earth Africa
- Global Mangrove Watch from Digital Earth Africa
- Fractional Cover from Digital Earth Africa
- 10m Annual Land Use Land Cover (9-class) from Impact Observatory
- PALSAR-2 ScanSAR CARD4L (L2.2) from the Japan Aerospace Exploration Agency (JAXA)
- Indiana Statewide Digital Aerial Imagery Catalog from the Indiana Geographic Information Office
- European Space Agency WorldCover 2021 & 2020 maps managed by Vito
- ASTER L1T Cloud-Optimized GeoTIFFs managed by Descartes Labs
- Updated: Daylight Map Distribution of OpenStreetMap managed by Meta—now includes Earth Tables as parquet files
Life sciences:
- IBL Behavioral Data on AWS from the International Brain Laboratory
- IBL Neuropixels Reproducible Ephys Data on AWS from the International Brain Laboratory
- Synthea synthetic patient generator data in OMOP Common Data Model managed by AWS
- CMS 2008-2010 Data Entrepreneurs’ Synthetic Public Use File (DE-SynPUF) in OMOP Common Data Model managed by AWS
- recount3 from Johns Hopkins University
- OpenCRAVAT from Johns Hopkins University
- Biological and Physical Sciences (BPS) Microscopy Benchmark Training Dataset from NASA
- Biological and Physical Sciences (BPS) RNA Sequencing Benchmark Training Dataset from NASA
- Updated: The Seattle Alzheimer’s Disease Study from the Allen Institute for Brain Science
Machine learning:
- Visual Anomaly (VisA) managed by AWS
- Phrase Clustering Dataset (PCD) from Amazon
Statistical and regulatory:
- Public Utility Data Liberation Project from Catalyst Cooperative
Learn more about AWS for open data
Looking to make your data available? The AWS Open Data Sponsorship Program covers the cost of storage for publicly available high-value, cloud-optimized datasets. We work with data providers who seek to democratize access to data by making it available for analysis on AWS; to develop new cloud-native techniques, formats, and tools that lower the cost of working with data; and to encourage the development of communities that benefit from access to shared datasets. Learn how to propose your dataset to the AWS Open Data Sponsorship Program.
Learn more about open data on AWS.
Read more about AWS for open data:
- Making weather forecasts more accessible using serverless infrastructure and open data on AWS
- Understanding wildfire risk in a changing climate with open data and AWS
- Accelerating and democratizing research with the AWS Cloud
- 22 new or updated open datasets on AWS: New polar satellite data, blockchain data, and more
- OpenFold, OpenAlex catalog of scholarly publications, and Capella Space satellite data: The latest open data on AWS
Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.
Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.