AWS HPC Blog
Massively-scaling quantum chemistry to support a circular economy
As part of Amazon’s Global Impact Computing team’s initiative on Digital Technologies for a Circular Economy, Accenture, Amazon, Good Chemistry, and Intel have joined forces to massively scale up humanity’s capability for simulating chemical reactions with computers.
PFAS are a large group of human-made chemicals that accumulate in the ecosystem and cause serious health issues to humans and animals. Remediation of PFAS pollution is a huge global challenge for humanity and part of the agenda for a “Circular Economy”, whose goal is to design-out waste and pollution.
To achieve this goal, we massively increased the scaling capability of Good Chemistry’s QEMIST Cloud to run a chemistry simulation using more than a million CPU cores to calculate energies for bond-breaking in several PFAS. This is an important step towards discovery of new pathways for PFAS destruction and reduce plastics waste from the environment. The results show the most accurate treatment of this process for these molecules (or any molecule of a comparable size), including the largest PFAS ever simulated with near-exact accuracy.
The methods developed in this project provide highly accurate solutions to the electronic Schrödinger equation, which is the most fundamental challenge in chemistry simulations. Hence, our novel solution has far-reaching potential benefits in various fields, from drug discovery and food innovation, to new battery material discovery and carbon capture.
In this post, we will demonstrate a new approach for running high-end scientific calculations using a software as a service (SaaS) platform, on-demand pay-as-you-go compute resources, and a container-based cloud HPC infrastructure.
The Critical Problem of PFAS Pollution
Per- and polyfluoroalkyl substances (PFAS) are a large class of human-made compounds that are designed to resist heat, chemical degradation, mechanical wear, and repel water, oil, etc. They are widely used for waterproof clothing, food packaging, firefighting foams, paints, adhesives, lubricants, and many other applications (see Figure 1).
PFAS do not generally biodegrade in the environment, which makes them known as “the forever chemicals.” PFAS exposure is associated with a host of detrimental health issues in humans, like cancer and reproduction problems2 (see Figure 2). New studies show 2,858 pollution sites in the US alone, and the potential of 200 million Americans having toxic PFAS in their drinking water3,4. The US environmental protection agency (EPA) recently announced that there is no safe level of exposure for two widely used PFAS5.
This means finding affordable, scalable, and zero-waste remediations to PFAS pollution is an urgent global challenge6. For example, current techniques to separate PFAS from drinking water create concentrated PFAS waste. This open challenge is evidenced by $1 billion funding in 2022 by the EPA for research into PFAS7.
The Challenge of Computational Chemistry
Computational chemistry is the process of running chemical simulations in computers instead of doing wet-lab experiments. It can significantly speed-up the R&D process of finding optimal pathways and catalysts for effective and scalable PFAS remediation.
A major pathway for PFAS destruction is defluorination, i.e., the dissociation of carbon-fluorine (C-F) bonds8, which are among the strongest known bonds (see Figure 3). Finding more efficient ways to break the carbon-fluorine bond is a key step towards PFAS destruction.
Our goal is to calculate the energies needed for breaking the C-F bonds in common PFAS molecules. Leveraging Accenture’s experience in simulating PFAS chemistry, we selected three PFAS molecules to study: Trifluoroacetic acid (TFA), Perfluorobutanoic acid (PFBA), and Perfluorooctanoic acid (PFOA), which are among the most widespread in the environment9. These three molecules represent three different levels of computational complexity and scale-up requirements.
Computational chemistry involves the quantum-mechanical problem of solving the electronic Schrödinger equation, also known as the electronic structure problem. An accurate solution to this problem can be obtained using numerically exact methods such as full configuration interaction (FCI). However, those calculations have always been intractable due to their computational complexity. An FCI calculation is only possible for tiny molecules, consisting of a few atoms, using small basis sets on the largest supercomputers available.
Therefore, many approximate approaches have been developed to reduce the complexity, like the “gold standard” coupled cluster method CCSD(T). Sadly, these are unable to accurately predict the properties of many complex systems, especially for bond-breaking experiments10.
In this project, we combined a novel algorithm with a massive cloud HPC infrastructure to tackle this simulation problem with near-exact accuracy. Solving the electronic Schrödinger equation with high accuracy is the most fundamental computational challenge in simulating chemistry and physics at quantum level. Therefore, the novel solution that we developed in this project has potential benefits for a wide range of fields, such as discovering new drugs, food and agriculture innovation, discovering new battery material and chemistry, finding carbon capture solutions, and reducing chemical pollution.
A Novel Solution with Massive Parallelization
The incremental full-configuration interaction (iFCI) algorithm is a novel, highly-scalable approximation of the FCI method, which can obtain results to within chemical accuracy11,12. The algorithm decomposes a molecule into many sub-problems (see Figure 4) that can be solved independently from each other. Hence, it is extremely suitable for massive parallelization.
As the level of iFCI’s expansion n increases, the impact of the terms becomes smaller, allowing the algorithm to converge at a relatively low n. iFCI enables finding accurate solutions for large molecules such as PFAS in strongly correlated configurations, for which standard quantum chemistry methods, like CCSD(T), fail.
QEMIST Cloud13 is a computational chemistry platform that has been developed by Good Chemistry. It is a high throughput cloud-native software as a service (SaaS) solution built on AWS infrastructure.
An accurate and scalable implementation of the iFCI method has been recently added to the QEMIST Cloud and has been used to perform the largest near-exact calculation for the polyacene molecules using 8000 vCPUs14.
In the current project, four parties joined forces to massively scale-up the iFCI implementation and QEMIST Cloud infrastructure to run on more than one million vCPUs.
In order to scale up the iFCI execution, a modern high-performance computing (HPC) cluster is created using Amazon Web Services (AWS) infrastructure. Figure 5 illustrates a high-level architecture of the iFCI application on a massively scalable HPC architecture (incorporated into QEMIST Cloud).
This automatically (and rapidly) scales-up to use more than a million CPU cores on non-reserved resources (low-cost Spot instances). This contrasts with the mainstream practice of HPC industry that uses dedicated high-power compute nodes. We created an HPC cluster with compute power on par with the largest supercomputers in the world15 at a small fraction of the cost and time to the end user.
Our HPC architecture is based on services like Amazon Elastic Kubernetes Services (Amazon EKS), Karpenter, and Amazon Aurora databases with Amazon RDS Proxy. Karpenter is a new tool to manage a Kubernetes multi-cluster with a faster and more flexible auto-scaling than Kubernetes itself (see Figure 6).
The iFCI implementation in QEMIST Cloud requires as little data transfer between compute nodes as possible, which enables parallelization on an ordinary networking infrastructure. Every job in the cluster is monitored and automatically rescheduled in the event of failure, which enables running our massive experiments without need for a dedicated infrastructure.
Making the Million Core Run
Scaling the iFCI implementation in QEMIST Cloud from running on 8000 vCPUs to one million required a significant engineering effort. The increase in the number of worker nodes requires us to scale and optimize all of the data pipelines. Here’s how we did that:
- The first challenge was to increase the number of worker nodes that were solving iFCI subproblems. To secure enough (spare) Amazon Elastic Compute Cloud (Amazon EC2) instances, we needed to switch from Kubernetes’s built-in scheduler to Karpenter, which allowed us to scale on many different instance types.
- Deploying the entire multi-cluster in one virtual private cloud was impossible with an IPv4 address space, so we had to deploy Amazon EKS on IPv6 (a recent feature)16.
- To take advantage of Amazon EC2 Spot availability in all AWS Availability Zones, and given the limits of Kubernetes, we deployed 13 Amazon EKS clusters in all Availability Zones in the region.
- The database connections supported by Amazon Aurora in Amazon RDS were not sufficient to work at our scale. Amazon RDS Proxy, which is simple to integrate, allowed us to maintain a large number of database connections needed for scaling. In parallel, the iFCI implementation in QEMIST Cloud was re-engineered to reduce database workload.
- Logging at this scale was another big challenge. We addressed it with by integrating Amazon CloudWatch Container Insights, which allowed monitoring of application bottlenecks and errors at scale.
We achieved these improvements in three phases by increasing the size of target PFAS molecules from TFA to PFBA and PFOA.
In phase 1, we calculated the bond dissociation curve of TFA on a single cluster with 8 instance types to run on 16,000 vCPUs. In phase 2, we deployed three clusters for a total of 80,000 vCPUs to calculate dissociation energies for PFBA. Finally, in phase 3, we increased the number of clusters to thirteen and resolved many bottlenecks, and then calculated the dissociation energy of PFOA on more than a million vCPUs.
During this run, the cluster auto scaled to around 500k vCPUs and got stuck. We investigated the situation to find a problem in network configuration. We scaled down to resolve the issue and then successfully scaled up to 1.1M vCPUs. The history of this run is plotted in Figure 7, which also shows the distribution of Kubernetes pods for each cluster.
Record-Breaking Results
For each of the three molecules (TFA, PFBA, and PFOA), we selected one of the equatorial C-F bonds for stretching, which is a good candidate for the defluorination process17.
We first conducted geometry optimizations to find equilibrium geometries at DF-MP2 level18. Next, we generated multiple geometries of each molecule by changing the C-F bond distance. Finally, we applied the iFCI algorithm to find the ground state energy for each molecule’s geometry. The resulting bond dissociation curves for the three molecules are illustrated in Figure 8.
Table 1 summarizes the results and performance metrics for the three simulated PFAS molecules: TFA, PFBA, and PFOA. To the best of our knowledge, PFOA is the largest ever molecule simulated for bond dissociation with near-FCI accuracy. This is a record-breaking exercise for both computational chemistry and cloud-based HPC technology.The results demonstrate the near-exact accuracy – and scalability – of the iFCI algorithm in QEMIST Cloud. The convergence and accuracy of the algorithm are validated by the non-parallelity error (NPE), which measures the difference between potential energy curves from different orders of n in the expansion.
The NPE between n=3 and n=4 is an order of magnitude smaller than typical errors in other methods19, indicating that these results are very close to the exact solution with an estimated error less than a few percent of the dissociation energy.
|
TFA | PFBA | PFOA |
Formula | CF3CO2H | C4HF7O2 | C8HF15O2 |
Pods | 1000 | 5000 | 62,500 |
Peak vCPUs | 16,000 | 80,000 | 1,100,000 |
Runtime | 1.4 hours | 1.0 hours | 37.5 (4.6) hours |
Bond dissociation energy | 114.8 kcal/mol | 108.7 kcal/mol | 109.6 kcal/mol |
Table 1 Simulated molecule with details of resources and performance. The runtime of PFOA is for simultaneous computation of two points on the curve, which can be further reduced by improvement already in progress. The number in parenthesis is the time when the QEMIST cloud was run with the maximum availability of 1.1 million vCPU cores. Bond dissociation energy is obtained from iFCI n=4 calculations in the gas phase. A rigid body scan (stretching one C-F bond distance while keeping the rest of geometry fixed) is performed.
Conclusion
The search for an affordable, scalable, and zero-waste solution for PFAS pollution is an ongoing effort across the globe. Quantum chemistry opens the door for studying this and many other challenging problems. However, the computational complexity of highly accurate quantum chemistry has always been a prohibitive factor for its application to real-world problems.
The introduction of new algorithms, such as iFCI, and new cloud HPC technology enables affordable and accurate calculations for many computational chemistry problems.
Good Chemistry, AWS, Accenture, and Intel joined forces to massively scale-up the QEMIST Cloud to run on Spot compute instances with more than a million CPU cores. This approach successfully calculated the C–F bond breaking energy in three PFAS molecules, TFA, PFBA, and PFOA.
To the best of our knowledge, PFOA is the largest PFAS to be simulated with near-exact accuracy to date.
We have demonstrated that an on-demand cloud HPC infrastructure with a modern, highly distributed, container-based solution is a great way for flexible and sustainable scientific computing.
The calculation of solutions to the electronic Schrödinger equation is the most fundamental problem in chemistry simulations. So, we think our proposed approach has extensive benefits to applications ranging from pharmaceuticals and the food industry to material science, green energy, and carbon capture.
If you want to discuss how the QEMIST Cloud can help you study PFAS molecules or any other chemical problems at the highest level of accuracy, reach out to us at ask-hpc@amazon.com. We want to help move the needle towards cleaner, healthier, more sustainable future for humanity.
References
2 Grandjean, Philippe et al. “Estimated exposures to perfluorinated compounds in infancy predict attenuate vaccine antibody concentrations at age 5-years”. In: Journal of Immunotoxicology 14.1 (2017), pp. 188–195.
3 Environmental Working Group. Mapping the PFAS contamination crisis: New data show 2,858 sites in 50 states and two territories. http://www.ewg.org/interactive-maps/pfas_contamination/. Accessed: 2022-08-18. 2022
4 Andrews, David Q and Olga V Naidenko. “Population-wide exposure to per-and polyfluoroalkyl substances from drinking water in the United States”. In: Environmental Science & Technology Letters 7.12 (2020), pp. 931–936.
5 US Environmental Protection Agency. EPA Announces New Drinking Water Health Advisories for PFAS Chemicals, $1 Billion in Bipartisan Infrastructure Law Funding to Strengthen Health Protections. http://www.epa.gov/newsreleases/epa-announces-new-drinking-water-health-advisories-pfas-chemicals-1-billion-bipartisan. Accessed: 2022-08-18. 2022.
6 PFAS Strategic Roadmap: EPA’s Commitments to Action 2021-2024. http://www.epa.gov/pfas/pfas-strategic-roadmap-epas-commitments-action-2021-2024. Accessed: 2022-08-18. 2021.
7 EPA Announces New Drinking Water Health Advisories for PFAS, 2022.
8 Dombrowski, Paul M. et al. “Technology review and evaluation of different chemical oxidation conditions on treatability of PFAS”. In: Remediation Journal 28.2 (2018), pp. 135–150. doi: https://doi.org/10.1002/rem.21555. eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1002/rem.21555.
9 Houde, Magali et al. “Biological monitoring of polyfluoroalkyl substances: a review”. In: Environmental science & technology 40.11 (2006), pp. 3463–3473.
10 Ghose, Keya B, Piotr Piecuch, and Ludwik Adamowicz. “Improved computational strategy for the state-selective coupled-cluster theory with semi-internal triexcited clusters: Potential energy surface of the HF molecule”. In: The Journal of chemical physics 103.21 (1995), pp. 9331–9346.
11 Zimmerman, Paul M. “Incremental full configuration interaction”. In: The Journal of Chemical Physics 146.10 (2017), p. 104102.
12 Rask, Alan E and Paul M Zimmerman. “Toward full configuration interaction for transition-metal complexes”. In: The Journal of Physical Chemistry A 125.7 (2021), pp. 1598–1609.
13 https://goodchemistry.com/qemist-cloud/
14 T. Yamazaki “Good Chemistry with QEMIST Cloud”, in: CSTCC 2022, June 2022, Kelowna, Canada and WATOC 2020, July 2022, Vancouver, Canada.
15 https://www.top500.org/
16 Sheetal Joshi, Apurup Chevuru and Mike Stefaniak. Amazon EKS launches IPv6 support. https://aws.amazon.com/blogs/containers/amazon-eks-launches-ipv6-support/. Accessed: 2022-08-18. 2022.
17 Bentel, Michael J et al. “Defluorination of per-and polyfluoroalkyl substances (PFASs) with hydrated electrons: structural dependence and implications to PFAS remediation and management”. In: Environmental science & technology 53.7 (2019), pp. 3718–3728.
18 “Psi4 1.4: Open-Source Software for High-Throughput Quantum Chemistry”, D. G. A. Smith, et al, J. Chem. Phys. 152, 184108 (2020)
19 “For example, for TFA using iFCI (n = 4) as a reference, the NPEs in the B3LYP-D3(BJ), ωB97XD-D3(BJ), PBE0-D3(BJ), MN15-D3(BJ), CCSD(T), and iFCI (n = 3) results are 52, 81, 62, 76, 95, and 3.6 kcal/mol, respectively. The large value for CCSD(T) is due to the failure of this method in the dissociation region. Please note that for TFA, the NPE was computed using the entire potential energy curve. For PFBA and PFOA, the NPE was computed using two points, one at the equilibrium geometry and the other near dissociation. For PFBA, the NPEs in the B3LYP-D3(BJ), ωB97XD-D3(BJ), PBE0-D3(BJ), MN15-D3(BJ), CCSD(T), and iFCI (n = 3) results are 61, 91, 72, 86, 71, and 3.4 kcal/mol, respectively. For PFOA, the NPEs in the B3LYP-D3(BJ), ωB97XD-D3(BJ), PBE0-D3(BJ), MN15-D3(BJ), CCSD(T), and iFCI (n = 3) results are 60, 90, 71, 85, 64, and 2.8 kcal/mol, respectively.