AWS Public Sector Blog
Largest metastatic cancer dataset now available at no cost to researchers worldwide
Metastasis derives from Greek words for removal, or migral. Metastastic cancer—where tumor cells spread to sites far from the tissue of origin—accounts for over 90% of fatalities from cancer, the leading cause of death worldwide.
Metastatic cancer presents a core challenge for modern oncology due to the high degree of variation that it can display on a genetic, molecular, or gross anatomic level compared to primary cancer — as well as the high degree of variation across patients in their disease presentation, progression, and outcome. Treating metastatic cancer can involve surgery, radiation therapy, chemotherapy, immunotherapy, and other treatments. Treatment plans require recurring imaging studies and clinical visits so patients can track their cancer and its response to therapy.
So how do we best record, model, and study this incredibly heterogenous and lethal disease in order to develop treatment plans that save lives? The NYUMets team, led by Dr. Eric Oermann at NYU Langone Medical Center, is collaborating with Amazon Web Services (AWS) Open Data, NVIDIA, and Medical Open Network for Artificial Intelligence (MONAI), to develop an open science approach to support researchers to help as many patients as possible.
NYUMets: Brain dataset now available for metastatic cancer research
With support from the AWS Open Data Sponsorship Program, the NYUMets: Brain dataset is now openly available at no cost to researchers around the world. NYUMets: Brain draws from the Center for Advanced Radiosurgery and constitutes a unique, real-world window into the complexities of metastatic cancer. NYUMets: Brain consists of data from 1,005 patients, 8,003 multimodal brain MRI studies, tabular clinical data from routine follow-up, and a complete record of prescribed medications—making it one of the largest datasets in existence of cranial imaging, and the largest dataset of metastatic cancer. In addition, more than 2,300 images have been carefully annotated by physicians with segmentations of metastatic tumors, making NYUMets: Brain a valuable source of segmented medical imaging.
Extending the MONAI framework to longitudinal data for cancer dynamics research
In collaboration with NVIDIA, the NYUMets team is building tools to detect, automatically measure, and classify cancer tumors. The team used MONAI, co-founded by NVIDIA and King’s College London, to build an artificial intelligence (AI) model for segmentation tasks, as well as a longitudinal tracking tool. Now, NYUMets: Brain can be used as a starting dataset by which we can apply AI to recognize metastatic lesions in imaging studies. Together with NVIDIA, the NYUMets team is extending the MONAI framework for working with metastatic cancer data. This data is most frequently longitudinal in nature, meaning many imaging studies are performed on the same patient to track their disease. This facilitates the study of metastatic cancer and cancer dynamics over time, more closely capturing how physicians study and patients experience cancer in the real world.
In addition, the NYUMets team built clinical measurements to augment the MONAI framework’s existing metrics. These cover practical medical use cases of treatment response and progression. With clinical metrics, the team intends to bridge the gap between AI technologies used in research and the application of these technologies in the clinic. One such clinical measurement tracks the change in tumor volume between imaging studies taken at different points in time. This is a crucial measurement for a patient undergoing cancer treatment—and could be applied to any disease where lesions change over time.
Get started with no-cost machine learning services to power metastatic cancer research
A preprint for the NYUMets flagship publication can be reviewed here. The NYUMets: Brain dataset is available to access at no cost with support from the AWS Open Data Sponsorship Program. It’s also available on the Registry of Open Data on AWS and on the AWS Data Exchange catalog. Users with AWS accounts can apply for access to the full dataset here. Once approved, you can access the dataset in the Amazon Simple Storage Service (Amazon S3) bucket using an Amazon S3 Access Point. Documentation for bucket structure and naming conventions can be explored at nyumets.org, including the NYUMets MONAI Extension. You can explore the entire MONAI framework here.
Read more about open data on AWS:
- Creating access control mechanisms for highly distributed datasets
- 33 new or updated datasets on the Registry of Open Data for Earth Day and more
- How researchers can meet new open data policies for federally-funded research with AWS
- Accelerating and democratizing research with the AWS Cloud
- Introducing 10 minute cloud tutorials for research
Subscribe to the AWS Public Sector Blog newsletter to get the latest in AWS tools, solutions, and innovations from the public sector delivered to your inbox, or contact us.
Please take a few minutes to share insights regarding your experience with the AWS Public Sector Blog in this survey, and we’ll use feedback from the survey to create more content aligned with the preferences of our readers.