AWS Partner Network (APN) Blog
Interactive Scientific Visualization on AWS with NVIDIA IndeX SDK
By Karthik Raman, Sr. HPC Specialist SA at AWS
By Dragos Tatulea, Sr. Software Engineer at NVIDIA
Scientific visualization is critical to understand complex phenomena modeled using high performance computing (HPC) simulations.
However, it has been challenging to do this effectively due to the inability to visualize, explore, and analyze large volumes of data and lack of collaborative workflow solutions.
NVIDIA IndeX on Amazon Web Services (AWS) addresses each of these problems by providing a scientific visualization solution for massive datasets, thus opening the doors for discovery.
NVIDIA IndeX is a 3D volumetric interactive visualization solution that enables scientists and researchers to visualize and interact with large compute datasets. It allows users to make modifications and navigate to the most pertinent parts of the data, all in real-time, to gather better insights faster.
In this post, you will learn about key IndeX features and its detailed deployment architecture and solution on AWS
AWS and NVIDIA have collaborated for over 10 years to deliver powerful, cost-effective, and flexible GPU-based solutions for customers. NVIDIA is an AWS Advanced Technology Partner, and customers around the world are using AWS and NVIDIA solutions for machine learning (ML), virtual workstations, HPC, and Internet of Things (IoT) services.
Amazon Elastic Compute Cloud (Amazon EC2) instances powered by NVIDIA GPUs deliver the scalable performance needed for fast ML training, cost-effective ML inference, flexible remote virtual workstations, and powerful HPC computations.
Solution Overview
NVIDIA IndeX on AWS leverages GPU clusters for scalable, real-time visualization, and computing of multi-valued volumetric data together with embedded geometry data.
Users have two main options to use IndeX: the standalone IndeX Software Developer Kit (SDK), or as a plugin in ParaView, which is one of the most popular visualization tools in the scientific HPC community.
IndeX has a direct plugin to load the compute data sets from Amazon Simple Storage Service (Amazon S3). This allows users to store massive datasets in S3 object storage and load them directly from the Amazon EC2 compute instances without copying them to the local storage.
Users can run IndeX on AWS using a custom Amazon Machine Image (AMI) from AWS Marketplace. Sample datasets from different scientific domains including astronomy, healthcare, and oil and gas are available in an S3 bucket which can be readily loaded to get a feel for the visualization experience provided by IndeX on Amazon EC2 instances.
In addition, users can bring their existing custom datasets used in their volume visualization workflows and render them by specifying the dataset parameters in a scene file.
Benefits of Using NVIDIA IndeX on AWS
Standard visualization workflows are characterized by sequential, time intensive steps that include:
- Running simulation for days, weeks, or months.
- Preparing a subset of output data for visualization.
- Transferring and loading the data into visualization software for analysis.
- Share findings and repeating the process with another subset of data, and/or running additional simulations with new inputs.
Such a workflow is not practical in an on-premises environment, however, due to compute and storage resource constraints, collaboration challenges, and the time wasted if the visualization data demonstrates the simulation data was invalid.
AWS provides an accelerated workflow that allows users to set up a configuration where simulation and visualization run on separate clusters. In this case, the simulation data running on a cluster is written to Amazon S3 object storage, can be visualized in another cluster by NVIDIA IndeX (directly loading from S3), and can be shared with any device on which a web browser is installed.
This workflow enables users to interactively explore and collaborate in real-time after the simulation has started, and can interrupt or restart simulations as needed.
AWS also supports running simulation and visualization in-situ (i.e. on the same cluster). Here, the simulation data can be written to local storage (instance store) or network attached Amazon Elastic Block Store (EBS) and can be visualized, thus enabling users to interactively direct a simulation and test outcomes.
Amazon EC2 GPU Instances
AWS offers multiple GPU-accelerated instances for high performance compute, machine learning, and graphics intensive workloads (for volume visualization and rendering). For example, the Amazon EC2 P3 instance provides up to eight NVIDIA V100 Tensor Core GPUs with up to 32GB of memory each, up to 100 Gbps of networking throughput, and supports NVLink for GPU peer-to-peer communication.
The Amazon EC2 G4 instance offers a cost-effective solution and supports up to four NVIDIA T4 Tensor Core GPUs with 16GB of memory each, up to 50 Gbps of networking throughput. The EC2 G4 metal instance supports eight NVIDIA T4 GPUs and 100 Gbps of networking throughput.
AWS also plans to offer NVIDIA A100 Tensor Core GPUs in the future. To learn more, sign up for our email list to receive updates.
Depending on the size of the data sets, users have the flexibility to use either a single instance or cluster of instances and scale as needed. The instance can have a single GPU or multiple GPUs, IndeX automatically makes use of all available GPUs in an instance for optimal performance.
The goal is to provide visualization performance that scales with data so users can analyze the entire dataset in its original resolution aiding in faster scientific discoveries.
Using IndeX on AWS, users can take advantage of both the performance features that IndeX SDK offers and the elasticity and compute capability of AWS.
Datasets for computational scientists continue to grow from terabytes to petabytes. AWS allows users to store such massive datasets in the cloud using storage services like Amazon S3, and enables multiple users to access the same datasets instead of making several local copies.
NVIDIA IndeX Architecture on AWS
Users have multiple options for 3D volume rendering on AWS using IndeX. Here, I will share an example architecture diagram for single instance rendering and cluster rendering.
Single Instance Rendering
Figure 1 – Architecture for single instance rendering on AWS.
In the case of single instance rendering, the user will deploy an Amazon EC2 GPU-based instance in an AWS region using the IndeX AMI from AWS Marketplace.
The custom datasets required for volume rendering should be uploaded to S3. Optionally, the Amazon EC2 instance can be backed by an EBS volume if needed.
When the Amazon EC2 GPU instance is running, the user can start a terminal and enable port forwarding via SSH tunneling to that instance. Once the connection is established, the user will launch another terminal and log in to the Amazon EC2 GPU instance and execute the IndeX SDK viewer script along with the sample or custom dataset for visualization.
In this case, we use IndeX’s HTML5-based client viewer that receives a video stream from the IndeX server running on Amazon EC2, and allows for socket-based transfer of user interactions or commands back to the server to steer visualization.
Users can use a browser on their client system to render and visualize their output. Note that a user can use a single instance with single or multi-GPUs. IndeX will automatically scale to the maximum number of GPUs on a given instance.
Detailed instructions are provided in the GitHub repository.
Cluster Rendering
Figure 2 – Architecture for cluster rendering on AWS.
In the case of cluster rendering, users can launch a cluster of Amazon EC2 GPU instances using AWS ParallelCluster. This is an AWS-supported open source cluster management tool that makes it easy to deploy and manage HPC clusters on AWS.
With AWS ParallelCluster, the user just needs to create a simple text file (config file) to model and provision the resources needed in terms of compute, storage, and networking in an automated secure manner. It also supports multiple batch schedulers like AWS Batch and Slurm for easy job submissions.
For cluster rendering on AWS using IndeX, the users will launch a cluster using the IndeX AMI from AWS Marketplace and required compute (GPU instances), storage (EBS, S3), networking and scheduler (Slurm) resources as specified in the ParallelCluster config file. In addition, users can load their custom datasets for visualization to S3.
Users can either use SSH tunneling (as shown in the single instance rendering section above) to launch the IndeX HTML5 client viewer on their local browser. They can also use a remote desktop using NICE DCV and launch a browser and IndeX client viewer there.
Here, we show the remote desktop method, as it’s quick and easy to deploy. Users can easily enable remote visualization using NICE DCV via AWS ParallelCluster configuration file.
Once the cluster is running, users can login to the Head node using the NICE DCV Client and launch the IndeX SDK viewer Slurm batch script, along with the sample or custom dataset for visualization.
The Slurm script is a wrapper that adds required parameters for network configuration based on environment variables set by Slurm Workload Manager for your cluster.
Once the job is running and server IP is printed on the terminal, user can open a browser (preferably Chrome) in the NICE DCV session and start visualizing their data.
Detailed instructions are provided in the GitHub repository.
Examples of Visualization Using NVIDIA IndeX on AWS
Parihaka – Seismic Data – Slices
Taranaki Basin is an onshore-offshore rift basin on the West Coast of New Zealand approximate 400 km west of the Pacific-Australian plate boundary.
The seismic data of the Parihaka survey is located towards the northern part of the Basin and was acquired on 2004 by New Zealand Petroleum and Minerals. The survey covers an area of 1,7 sq.km and the seismic data volume consists of 1132 inlines and 2904 crosslines.
NVIDIA IndeX is commonly used in the seismic interpretation systems for scalable, high-fidelity, and real-time visualization of seismic data. The visualization of the entire Parihaka seismic volume with embedded slices highlights that NVIDIA IndeX with its XAC technology can boost conventional seismic interpretations workflows.
Figure 3 – Parihaka Seismic Data Slices rendered using IndeX on AWS.
Special thanks to Crown Minerals and the New Zealand Ministry of Economic Development for allowing us to display this Taranaki Basin dataset. Crown Minerals manages the New Zealand Government’s oil, gas, mineral and coal resources.
Big Brain – A Human Brain Dataset
The brain dataset is an ultra-high resolution scan of a human brain at nearly cellular resolution of 20 micrometers, based on the reconstruction of 7,404 histological sections.
This dataset contains the reconstituted sections in the coronal dimension. It shows aligned sections in the axial and sagittal dimension, and histological volumes in histological space.
The NVIDIA IndeX Accelerated Computing technology (XAC API) is the powerful tool that enables researchers to carve out and identify different features of particular interest inside the visualizations.
Figure 4 – Human brain dataset rendered using IndeX on AWS.
Special thanks to Prof. Dr. med. Katrin Amunts and the Structural and Functional Organization of the Brain Lab at the Institute of Neuroscience and Medicine, Research Centre Jülich.
Summary
Scientific visualization impacts all research disciplines, from microscopy over archaeology to astronomy. It’s hard to use traditional scientific visualization solutions to visualize massive data sets gathered by complex simulations.
NVIDIA IndeX on AWS addresses these challenges by providing a scalable, distributed, GPU-based architecture for highly parallelized, high performance computation and visualization. Running IndeX on AWS provides the ability to interactively explore and analyze huge visualizations and provides a collaborative workflow solution for multi-disciplinary teams.
The interactive scientific visualization on AWS comes with a per-user based EULA. Each user will be charged by $5 per hour per session. The price is independent of the number of instances, number of GPUs per session, or size of the dataset being visualized.
If you are interested in running IndeX on AWS subscribe to the IndeX AMI in AWS Marketplace.
NVIDIA – AWS Partner Spotlight
NVIDIA is an AWS Advanced Technology Partner that has collaborated with AWS for over 10 years to deliver powerful, cost-effective, and flexible GPU-based solutions for customers.
Contact NVIDIA | Practice Overview | AWS Marketplace
*Already worked with NVIDIA? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.