AWS Public Sector Blog
Extracting insights from PubMed articles using Amazon Q Business
The scientific landscape is rapidly evolving, and the sheer volume of research output can be overwhelming. For professionals across industries—from academia to healthcare to pharmaceuticals—staying abreast of the latest developments is crucial. The National Center for Biotechnology Information’s (NCBI) PubMed Central (PMC), a leading resource for biomedical literature, offers a vast repository of full-text biomedical and life sciences journal articles. This invaluable resource, which is also available through the Registry of Open Data on AWS (RODA), enables researchers, healthcare and life sciences professionals, and academic institutions (hereafter collectively referred to as “researchers”) to access peer-reviewed literature, stay up to date with the latest discoveries, facilitate collaboration, and accelerate the translation of research findings into practical applications.
Researchers heavily rely on PMC for several reasons. The availability of full-text articles enables large-scale data mining and text analysis, potentially unveiling new insights and patterns within the scientific literature. However, sifting through this wealth of information to extract actionable insights can be daunting and difficult. While PMC offers immense potential to accelerate scientific progress, inform evidence-based practice, and drive innovation in the biomedical and life sciences fields, the sheer volume of data presents a significant challenge.
This is where Amazon Q Business comes in. It’s a powerful service designed to streamline the process of analyzing vast amounts of scientific literature and provide valuable insights efficiently. By using Amazon Q Business, researchers can more effectively harness the wealth of information available in PMC. This tool has the potential to significantly accelerate scientific progress, enhance evidence-based practice, and drive innovation in the biomedical and life sciences fields by making large-scale data analysis more accessible and manageable.
In addition to services like Amazon Q Business, researchers can also benefit from resources like the RODA, which includes the PMC dataset. RODA is a valuable initiative that provides researchers, developers, and data enthusiasts with access to a vast collection of open and publicly available datasets. This program allows users to seamlessly discover, access, and analyze a wide range of data from various domains, including healthcare, Earth observation, genomics, climate, and more. By using AWS Cloud infrastructure, users can leverage scalable and cost-effective storage and computing resources to process and analyze these datasets. RODA not only fosters collaboration and knowledge sharing but also accelerates scientific discoveries and drives innovation across different industries. With its user-friendly interface and integration with various AWS services, the program simplifies the process of working with large and complex datasets, empowering users to extract valuable insights and achieve data-driven decision-making.
Why Amazon Q Business?
One of the key benefits of using Amazon Q Business is the ability to unlock valuable insights from diverse data types and sources. Amazon Q Business provides a unified platform to ingest, process, and analyze this data, leveraging generative artificial intelligence (AI) capabilities. This enables customers to gain a comprehensive understanding of their data, ultimately driving better decision-making. By utilizing Amazon Q Business and its generative AI features, researchers can automate the extraction of relevant information, summarize findings, and identify key trends and patterns within datasets. This makes it an ideal solution for analyzing PubMed research articles, as the generative AI can assist in synthesizing complex medical information and generating human-like summaries.
Benefits of using Amazon Q Business for PubMed analysis
- Automated data processing
- One of the most significant advantages of Amazon Q Business is its ability to automate the data processing pipeline. Instead of manually reading through hundreds of articles, Amazon Q can quickly scan, interpret, and categorize vast amounts of textual data, saving valuable time and effort.
- Advanced natural language processing (NLP) capabilities
- Amazon Q Business utilizes state-of-the-art NLP algorithms to understand and interpret complex scientific texts. This enables it to accurately extract key information from PubMed articles, such as study objectives, methodologies, results, and conclusions.
- Summarization and reporting
- Amazon Q Business can generate concise summaries of research articles, highlighting the most critical points. This feature is particularly useful for creating reports or presentations, enabling stakeholders to quickly grasp essential information without having to dive deep into the full text.
- Scalability
- Amazon Q Business is designed to handle large datasets, making it scalable to accommodate growing volumes of research publications. As more articles are published on PubMed, Amazon Q can continue to provide timely and relevant insights.
- User interface (UI)
- With Amazon Q Business, researchers can engage in a conversation about the data. Amazon Q Business provides a hosted chatbot user interface. Users can ask questions, summarize key insights, and leverage the chatbot to generate content.
- Security and governance
- Amazon Q Business is built to be secure and private, and it can understand and respect your existing identities, roles, and permissions. If a user does not have permission to access certain data without Amazon Q Business, then they still will not have access to that data using Amazon Q Business.
- Amazon Q Business provides administrative controls, such as the ability to block entire topics and filter both questions and finalized answers using administrator-specified keywords to keep responses aligned with an organization’s policies.
Solution overview
The following diagram illustrates a high-level architecture showing how to retrieve a copy of the data, store it in the cloud, index the data, and create a secure chatbot to interact with the data.
Prerequisites
In order to complete this walk-through on your own, you will need to (1) have an AWS account and the ability to create and modify an Amazon Simple Storage Service (Amazon S3) bucket, as well as (2) have signed up for an Amazon Q Business subscription.
How to use Amazon Q for PubMed research
- Step 1: Retrieve Data
- Log into your AWS account
- Open AWS CloudShell to access a browser-based command line to run commands within your account
-
- Enter the AWS Command Line Interface (CLI) command to create an Amazon S3 bucket for you to store the PMC data within your account.
- The S3 bucket name must be globally unique across all customers, it must adhere to the S3 bucket naming rules, and you must specify the AWS Region you would like to work in.
- Enter the AWS Command Line Interface (CLI) command to create an Amazon S3 bucket for you to store the PMC data within your account.
aws s3 mb s3://{{my-unique-bucket-name}} --region us-east-1
-
- Use the AWS CLI “s3 sync” command to synchronize the data from the Registry of Open Data on AWS into the new S3 bucket.
- Here we are copying files from the “pmc-oa-opendata” S3 bucket and specifically objects with the path “/oa_comm” into your newly created S3 bucket in your account.
- Use the AWS CLI “s3 sync” command to synchronize the data from the Registry of Open Data on AWS into the new S3 bucket.
aws s3 sync s3://pmc-oa-opendata/oa_comm s3://{{my-unique-bucket-name}}
The above command copies data from PMC S3 bucket to local S3 bucket.
- Step 2: Create Amazon Q Business application
- In the search bar, navigate to Amazon Q Business
- Select “Get Started” and “Create Application”
- Enter Application name “PMC-Chatbot” and select “Create”
- Configure the following settings
- Retriever: select “Use Native Retriever”
- Index Provisioning: select “Enterprise”
- Number of units: 30
- Configure a data source
You can incorporate one or many data sources for the Amazon Q Business native retriever to index. There are a multitude of pre-built connectors options for both cloud- and on-premises-based data sources. Select the “+” next to the “Amazon S3” connector.
- Name the data source: “PMCdata”
- In the “IAM role” section, select “Create a new service role (Recommended)”
- In “Sync scope”: select the Amazon S3 bucket created earlier
- For the Sync Mode choose “New, modified, or deleted content sync” and a “Monthly” sync run schedule
- Select Add data source
- Configure user access
- Select ‘Add groups and users’ drop down and assign users to the application and select an Amazon Q Business subscription
- The new user will receive an email invitation to the Amazon Q Chatbot and set up a password
- Step 3: Index the data
At this point you have downloaded the data to your AWS account, created an Amazon Q Business application, created and assigned a user with access to the chatbot, and created the data source. We will now need to run the data source sync so that the chatbot can incorporate the PMC data into the responses.
From the Amazon Q Business application, select your application and the data source you created.
- Select “Sync Now” to begin the indexing process.
- Step 4: Interact with the chatbot
- Once indexing has completed, you can explore the chatbot by navigating to the “Web Experience Settings” and selecting the “Deployed URL”
- Optionally, you can customize your chatbot with the “Customer Web Experience”
- Once indexing has completed, you can explore the chatbot by navigating to the “Web Experience Settings” and selecting the “Deployed URL”
-
- The new chatbot is now able to respond based on the indexed PMC data. Additionally, the chatbot can provide reference details on the source documents used to generate each response.
Real-world applications
Amazon Q Business offers significant advantages to a wide range of professionals in the healthcare and life sciences sectors. Academics, researchers, healthcare providers, and pharmaceutical companies can all benefit from its capabilities in various, often overlapping ways. For instance, the service enables users to perform comprehensive literature reviews, identify research gaps, and stay informed about the latest discoveries in their respective fields. This is particularly valuable for academic researchers, but also for healthcare professionals and pharmaceutical companies seeking to stay at the forefront of medical advancements.
Healthcare providers can leverage Amazon Q Business to keep abreast of the latest medical research, enabling them to provide evidence-based care and improve patient outcomes. Similarly, pharmaceutical companies can use the service to stay updated on recent developments in drug research, clinical trials, and medical treatments. This shared knowledge base allows for better-informed decision-making across the board, from patient care to drug development strategies.
The service’s ability to quickly identify promising studies and synthesize large volumes of information is beneficial for all users. It can accelerate R&D efforts in pharmaceutical companies, inform treatment decisions for healthcare providers, and guide research directions for academics. By providing efficient access to a wealth of scientific literature, Amazon Q Business fosters a more connected and informed approach to healthcare and life sciences, ultimately contributing to advancements in patient care, drug development, and medical research.
Conclusion
In this blog post, we demonstrate how to leverage Amazon Q Business to quickly create a generative AI application using PubMed data in the Registry of Open Data on AWS. By harnessing the advanced capabilities of Amazon Q Business, we are able to index PubMed data and use an Amazon Q Chatbot for an interactive experience to extract insights from the literature.
Amazon Q Business is revolutionizing the way organizations extract insights from data sources such as PubMed research articles. By automating data processing and leveraging advanced NLP capabilities, Amazon Q enables users to efficiently analyze vast amounts of scientific literature and uncover valuable insights. Whether you are in academia, healthcare, or the pharmaceutical industry, Amazon Q Business can help you stay ahead in the continuously evolving world of biomedical research. Embrace the power of Amazon Q Business and transform your approach to research analysis today.