AWS HPC Blog
Accelerating molecule discovery with computational chemistry and Promethium on AWS
This post was contributed by Perminder Singh Quantum Solution Architect, ISV, and Satish Gandhi, partner development manager ISV at AWS; and Christoph Siegert, SVP Product at QC Ware
Computational chemistry for small molecule discovery remains one of the major opportunities across many high impact industries, such as pharmaceuticals and chemicals, as highlighted by Nature, BCG, and McKinsey. Many of the world’s largest problems are chemistry problems at their core.
For example, creating new catalysts is a high-impact chemistry problem that will benefit from accelerated molecular simulations. Discoveries in creating new catalysts could lower the gas requirements in the ammonia production process, which uses about 3% of global gas consumption. Another example is potentially shortening the drug development timeline, which currently can take 5 or more years. BCG estimates that molecular simulations could create a $60-130 billion value across pharma and chemistry alone.
However, in computational chemistry, researchers typically have to make a tradeoff between speed and accuracy. QC Ware saw an opportunity to deliver both speed and accuracy. They developed a new computational chemistry solution called Promethium that leverages Amazon Elastic Compute Cloud (Amazon EC2) GPU instances. Promethium performs up to 100 times faster than traditional high accuracy solutions. In this post, we’ll dive into Promethium and explain how it achieves this.
Computational chemistry capabilities
Promethium is a density-functional theory (DFT) computational chemistry solution. Density-functional theory is a highly accurate computational quantum chemistry method that doesn’t require training data, making it immediately applicable across industries:
- Pharma: small molecule drugs, covalent inhibitors, protein-ligand binding, and PROTACs
- Agriculture: pesticides, fertilizers, ammonia
- Energy: oil & gas catalysts, reforming, battery materials
- Electronics: organic light emitting diodes (OLEDs), organic electronics
- Chemicals: polymers, polyurethane, catalysts, and beyond
- Automotive / Aerospace: catalysts, new materials, polymers
- Environmental: water treatment, air purification
However, the high accuracy provided by DFT comes at a price. It requires a high computational power, which limits the throughput, size, and complexity of simulated molecules. Promethium improves processing speed by using GPUs, specialized GPU software, and the scalability of AWS cloud. When using Promethium, computational chemists get highly accurate results, quickly.
Promethium delivers significant speedups over current best-in-class DFT software. The following chart shows how many single point energy calculations can be run on Amazon EC2 instances with various DFT solutions.
This chart shows the number of molecules (Λ-FL172 organoruthenium complex) that can be simulated with DFT in 24hrs for each compute node across various commercial and open-source solutions.
The computational requirement also impacts the size and complexity of molecules that can be simulated. The new solutions allow treating a large space of protein-ligand interaction as an active area. For example, a 2,056 atom protein with ωB97/def2-SVP level of theory runs for about 14 hours on a single GPU from an Amazon EC2 P4d instance.
Scalable and secure architecture
Promethium is a software as a service (SaaS) solution built on AWS and uses Amazon EC2 GPU instances to provide customers on-demand scalable computational resources. Promethium uses a microservices-based architecture deployed on Kubernetes. The solution dynamically reacts to customer demand and automatically scales Amazon EC2 P3 and P4 instances with Karpenter, an open-source compute resource manager for Kubernetes.
Molecule intellectual property is the most sensitive and important asset for QC Ware’s customers. Therefore, Promethium’s architecture follows AWS best practices for cloud security. For example, the solution includes least-privilege role-based access control, encryption at rest and in transit, multi-factor authentication (MFA), and single sign-on. The following diagram outlines Promethium’s architecture.
User Interface
Promethium can be accessed through graphical and command-line interfaces. For ease of use, the graphical user interface (GUI) includes drag and drop, molecule visualizations, pre-set defaults, and automated error checking. As a SaaS solution hosted on AWS, customers don’t need to worry about managing the underlying AWS infrastructure, the dependency stack, or software updates. And thus, customers can focus on their chemistry work.
Users can customize all common settings within the GUI, and additionally, use a JSON customization form for advanced settings.
A REST API is also available for chaining workflows together and accessing advanced capabilities. Users can access the API through a Python SDK. The API can be integrated with other workflow tools and software products. For example, Electronic Lab Notebooks (ELBs) or data and workflow platforms, such as Schrodinger LiveDesign.
Example use case: Conformer Search
Promethium currently features nine different chemistry simulations focused on molecular properties, nonbonded interactions, and chemical reactions. Molecular property simulations include conformer searches (featured here), geometry optimizations, single point energies, and torsion scans. Non-bonded interactions include SAPT / F-SAPT; and reaction simulations include reaction path optimizations, transition states, and reactant-product transition states. Additional calculations, such as Vibrational Frequency and thermodynamics, are available for all workflows. This list is continuously growing as customers request new capabilities, such as spectroscopy, ligand ranking, and more.
One common problem that chemists want to solve when designing new molecules is to understand the different 3D shapes that a molecule can take and how often each shape appears. Chemists can use Promethium’s Conformer Search capability to identify all of the molecule’s stable shapes (lowest relative energy) and their prevalence (Boltzmann probability distribution).
Based on a given molecule input, Promethium goes through several stages of calculations and filtering. Each stage gets more precise until it has the final distribution of conformers with DFT-level accuracy. Promethium completes this process in three primary steps:
- Launching Amazon EC2 instances for parallel processing
- Identifying and filtering potential conformers
- Analyzing conformers and visualizing the results
First, Promethium launches Amazon EC2 instances based on the number of molecules submitted. Customers can simultaneously submit as many molecules as they desire. Simultaneous submissions are parallelized across available EC2 instances. Within each molecule’s conformer search, Promethium parallelizes the DFT steps across multiple GPUs to ensure low wall clock time. Promethium uses a queuing system to manage submitted jobs and Amazon EC2 capacity.
The diagram represents the following workflow:
- Customers submit multiple molecules for conformer searches
- Promethium scales Amazon EC2 instances to parallelize submitted jobs
- Promethium scales additional EC2 instances within each single job
Second, Promethium identifies and filters potential conformers over four stages. In the first stage, Promethium generates a large set of potential conformers. These are subsequently filtered with a force field (FF), Artificial Narrow Intelligence (ANI) neural network potentials, or Geometry, Frequency, and Noncovalent Interaction (GFN) methods. This removes the conformers that are the least stable and leaves a smaller set to evaluate with DFT. Promethium evaluates the remaining conformers with two high accuracy DFT filters to determine the conformers with the lowest relative energy values. These conformers have the highest stability.
Third, Promethium analyzes the results. The solution visualizes the three-dimensional structures, the stage progression, probability distribution and relative energies of each conformer.
The graphic represents the relative energy of possible conformers as they are filtered by a force field (1st row), an ANI filter (2nd row), and two separate DFT filters (3rd and 4th rows).
The following graphic shows the visualization of the Boltzmann weight and probability distribution for the conformers that passed the final filter stage.
Pricing
Promethium uses a consumption-based pricing model, which means that customers only pay based on the actual GPU-hours used (accrued by the second). There are no upfront costs, no annual costs, and no per user or per seat costs. The total cost of ownership is often lower than using legacy open source DFT software.
Conclusion
In this post, we started with the crucial opportunities and challenges of computational chemistry and its role in advancing pharmaceuticals and chemicals. We outlined Promethium, a SaaS solution running on Amazon EC2 GPU instances. Promethium takes a step towards resolving the speed-accuracy tradeoff, a longstanding obstacle in computational chemistry.
We used a conformer search example to demonstrate Promethium tackling the challenges associated with larger and complex molecules through GPU-optimized algorithms. Promethium’s scalable architecture and solution approach make it a practical tool in molecule discovery. If you want to learn more, you can visit Promethium’s homepage or find additional details through Promethium on AWS Marketplace.
The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this blog.
References
[1]: https://www.nature.com/articles/d41573-022-00025-1
[2]: https://www.bcg.com/publications/2022/ai-in-drug-discovery-impact
[4]: Werner Vogel at AWS re:Invent 2022 https://youtu.be/RfvL_423a-I?t=6553
[5]: https://www.bcg.com/publications/2023/enterprise-grade-quantum-computing-almost-ready