AWS for Industries
Building a generative AI reservoir simulation assistant with Stone Ridge Technology
In the field of reservoir simulation, accurate modeling is paramount for understanding and predicting the behavior of subsurface flow through geological formations. However, the complexities involved in creating, implementing, and optimizing these models often pose significant challenges, even for experienced professionals. Fortunately, the integration of artificial intelligence (AI) and large language models (LLMs) offers a transformative solution to streamline and enhance the reservoir simulation workflow. This post describes our efforts in developing an intelligent simulation assistant powered by Amazon Bedrock, Anthropic’s Claude, and Amazon Titan LLMs, aiming to revolutionize the way reservoir engineers approach simulation tasks.
Background on reservoir simulation models
Reservoir simulation models fall under the category of Earth Simulation Models, encompassing various domains such as groundwater flow, geothermal energy, hydrocarbon extraction, and CO2 sequestration. For hydrocarbon extraction, an example simulation of the Norne [1] reservoir model is shown in the following figure, where understanding the evolution of pressures and saturations of oil (heavy hydrocarbons), gas (lighter hydrocarbons), and ground water is key to identifying well locations and extraction strategies. From a modeling perspective, three distinct workflows can be identified.
Getting the model “right”: This workflow involves making sure that the model accurately mimics the physical reality within acceptable tolerances. In reservoir simulation, this is achieved through a process called “History Matching,” where the model parameters are fitted to observed data while minimizing error.
Implementing the model accurately and efficiently: Once the model has been validated, the focus shifts to making sure of its correct and efficient implementation on a digital computer.
Optimizing model parameters: With a properly implemented model, engineers can explore different scenarios to optimize hydrocarbon recovery or CO2 storage by adjusting boundary conditions, well patterns, and injection/production schedules.
Although the first and third workflows often involve advanced techniques such as ensemble simulations, Kalman filters, inverse modeling, and reinforcement learning, this post primarily concentrates on the second workflow: making sure of the correct and efficient implementation of the reservoir simulation model.
Figure 1 – Reservoir simulation of the Norne [1] field showing evolution of pressure over time
Integration of AI and domain knowledge
The simulation assistant is a powerful tool designed to address the complexities of preparing simulation input data, vetting existing simulation models, and selecting optimal simulation options based on observed simulation logs. By using the capabilities of Amazon Bedrock, Amazon Titan embeddings, and Anthropic’s LLMs, the assistant provides a seamless integration of AI and domain knowledge, enabling efficient and accurate reservoir simulation workflows. The intelligent simulation assistant draws upon a diverse range of data sources to deliver precise and relevant information to users. These sources include:
1. Simulation Model: Physical inputs, parameters, and solution techniques
2. Product Knowledge: Documents, reports, technical manuals, webpages, blog entries, and other relevant resources
3. Processed Product Knowledge: Vector stores, SQLite databases, YAML files, and other structured data formats
4. Simulation Log and Results Files: Output data generated during simulation runs
5. Processed Simulation Runtime/Execution Knowledge: Filtered and structured information derived from simulation logs and results
By seamlessly integrating these data sources, the assistant facilitates effective human-machine collaboration, enabling users to use the vast knowledge base while receiving tailored and contextual responses to their queries. The following figure showcases an agentic framework, using several different models to perform callbacks, data analysis, processing, and queries at various segments of the reservoir simulation assistant workflow.
Figure 2 – Overview of various domain knowledge sources used together with a foundation model and Retrieval-Augmented-Generation (RAG) to perform various stages of generative AI-based data analysis and provide natural language query-based insights
Key capabilities
The intelligent simulation assistant offers a comprehensive range of capabilities to streamline and enhance the reservoir simulation workflow.
General Inquiries and Keyword Explanations: The assistant can accurately answer general questions about the simulation software and its capabilities, provide detailed information about specific keywords and their usage, and offer examples for keyword implementation.
Model Analysis and Issue Identification: By analyzing the simulation model input data, the assistant can identify potential issues or inconsistencies, alerting users to potential problems before executing the simulation.
Simulation Run Analysis and Optimization: During and after simulation runs, the assistant can analyze the log and results files, identifying potential issues or performance bottlenecks. Then, it can provide recommendations for resolving these issues or optimizing the simulation process.
Interactive Model Manipulation and Re-running: Users can interact with the assistant to modify simulation models directly, addressing identified issues or exploring alternative scenarios. Then, the assistant can initiate and monitor new simulation runs based on the updated models.
The intelligent simulation assistant follows a rigorous workflow to deliver these capabilities effectively. It uses a combination of precomputed knowledge bases, model-specific data stores, and callback agents to provide contextual information throughout the interaction. This approach makes sure that users receive accurate and relevant responses tailored to their specific queries and simulation scenarios.
One important consideration in building generative AI tools is data security. By converting the simulation data and results analysis into tokens and storing them in a secure database within a private cloud environment, and not exposing this to the underlying LLM model directly through training or fine-tuning, data security and confidentiality is preserved.
Using Amazon Bedrock and Anthropic’s LLM models
The power of the intelligent simulation assistant lies in the seamless integration of Amazon Bedrock and Anthropic’s LLMs. Amazon Bedrock, a fully managed service for building and deploying machine learning (ML) models, provides a robust environment for hosting and serving the assistant’s AI components. By using Amazon Bedrock, the assistant can harness the capabilities of Anthropic’s LLMs, enabling advanced natural language processing and generation.
Anthropic’s LLMs, such as Claude, are at the forefront of AI technology, offering unparalleled language understanding and generation capabilities. These models are trained on vast amounts of data, allowing them to grasp complex concepts and generate human-like responses. By integrating these LLMs into the assistant, users can engage in natural conversations, making sure of an intuitive and efficient interaction experience.
The following image shows an architecture on AWS using Claude models on Amazon Bedrock, which ingest information from the simulation inputs and outputs, together with a reservoir simulation knowledge base. Note that this architectural model simplifies the agent workflow to include a single analysis and domain-expert agent, and it does not include the complexity of simulation execution using an LLM agent, which is still an active area of research in terms of guardrails and implementation.
Figure 3 – AWS architecture and implementation diagram for the agents for knowledge base and simulation input/output analysis
The steps outlined in this architecture are as follows:
1. The simulation engineer would begin by accessing a reservoir simulation knowledge base tailored with domain-specific expert knowledge on geophysical properties, reservoir engineering, and the use of Stone Ridge Technology’s reservoir simulation tool Echelon.
2. This knowledge can be tokenized into a vector database and is ingested into OpenSearch. In order to perform this, the knowledge base is collated into a set of documents, and loaded using LangChain community tools. Amazon Titan G1 embeddings are used together with OpenSearch through LangChain for vectorization (see this guidance example on GitHub). Using Python APIs, Claude on Bedrock can be used to expose this as a chatbot application through a Streamlit interface, which is running on an Amazon Elastic Compute Cloud (EC2) instance or an Amazon Elastic Kubernetes Service (EKS) node.
3. The simulation engineer builds a model with the generative AI tool as a guidance, with a physically consistent simulation input file, primarily consisting of:
a) Mesh/grid files specifying the number of active cells
b) Permeability and porosity maps, which are static properties of the reservoir
c) Simulation parameters, for example timestep length and tolerances associated with preconditioners, as well as linear and non-linear solvers
d) Reporting times and well controls for the reservoir
4. Then, this is stored on the data storage medium, to be passed to the execution environment where the simulation is run: AWS ParallelCluster. At the same time, this is also vectorized using Amazon Titan G1 embeddings, similar to the knowledge base explained earlier, and appended to the OpenSearch vector database for RAG with the initial domain-expert LLM agent. Therefore, it is exposed to the Streamlit application at the user end.
5. In order to run the reservoir simulations, the API server, web server, and data management EKS nodes are launched within a private subnet. The AWS ParallelCluster environment is setup with Nvidia A100/H100 GPUs on P4/P5 instances for compute, as the reservoir simulation is GPU-accelerated, and an Amazon FSx for Lustre file system for high-performance storage. The simulations are launched through the CLI, using the input model created in Step 3. Once the simulations are completed, results are processed though AWS Lambda, and this data is migrated into the data management Amazon Simple Storage Service (S3) bucket manually, where a downstream analysis can be performed. The human-readable outputs can be vectorized by converting into PDF formatted text, using OpenSearch with Amazon Titan embeddings once again, and passed back to the LLM chatbot interface for natural language queries.
Although not covered in this architecture, two key elements enhance this workflow significantly and are the topic of future exploration: 1) simulation execution using natural language by orchestration through a generative AI agent, and 2) multimodal generative AI (vision and text) analysis and interpretation of reservoir simulation results such as well production logs and 3D depth slices for pressure and saturation evolution. As future work, automating aspects of our current architecture is being explored using an agentic workflow framework as described in this AWS HPC post.
Conclusion
The integration of AI and domain knowledge is an iterative process, and the intelligent simulation assistant should be designed to adapt and evolve as new insights and data become available through incrementally evolving domain knowledge. By continuously expanding its knowledge base and refining its prompts and interactions, the assistant remains at the forefront of reservoir simulation technology, providing users with a powerful and ever-improving tool.
The intelligent simulation assistant represents a significant milestone in the integration of AI and reservoir simulation workflows. By harnessing the power of Amazon Bedrock and Anthropic’s LLMs, this assistant offers a seamless blend of cutting-edge AI technology and domain-specific knowledge. In turn, this empowers reservoir engineers to tackle complex simulation challenges with unprecedented efficiency and accuracy.
Through its comprehensive capabilities, ranging from model analysis and issue identification to interactive model manipulation and simulation optimization, the intelligent simulation assistant streamlines the entire reservoir simulation process. By using advanced prompt engineering and structured knowledge representation techniques, the assistant makes sure of highly accurate and contextualized responses, enabling users to make informed decisions and optimize resource recovery.
As AI technology continues to evolve, and the integration of domain knowledge and machine intelligence deepens, the intelligent simulation assistant paves the way for a new era of reservoir simulation, where human expertise and AI-driven insights converge to unlock unprecedented insights and drive innovation in the energy industry.
Users who are interested in discussing the business case or technical product details can reach out to the AWS team or Stone Ridge Technology and refer to this accompanying Stone Ridge Technology post.
References:
[1] Rwechungura, R., Suwartadi, E., Dadashpour, M., Kleppe, J., and B. Foss. “The Norne Field Case—A Unique Comparative Case Study.” Paper presented at the SPE Intelligent Energy Conference and Exhibition, Utrecht, The Netherlands, March 2010. doi: https://doi.org/10.2118/127538-MS.