AWS Public Sector Blog

Extracting insights from PubMed articles using Amazon Q Business

AWS branded background design with text overlay that says "¬¬¬Extracting insights from PubMed articles using Amazon Q Business"

The scientific landscape is rapidly evolving, and the sheer volume of research output can be overwhelming. For professionals across industries—from academia to healthcare to pharmaceuticals—staying abreast of the latest developments is crucial. The National Center for Biotechnology Information’s (NCBI) PubMed Central (PMC), a leading resource for biomedical literature, offers a vast repository of full-text biomedical and life sciences journal articles. This invaluable resource, which is also available through the Registry of Open Data on AWS (RODA), enables researchers, healthcare and life sciences professionals, and academic institutions (hereafter collectively referred to as “researchers”) to access peer-reviewed literature, stay up to date with the latest discoveries, facilitate collaboration, and accelerate the translation of research findings into practical applications.

Researchers heavily rely on PMC for several reasons. The availability of full-text articles enables large-scale data mining and text analysis, potentially unveiling new insights and patterns within the scientific literature. However, sifting through this wealth of information to extract actionable insights can be daunting and difficult. While PMC offers immense potential to accelerate scientific progress, inform evidence-based practice, and drive innovation in the biomedical and life sciences fields, the sheer volume of data presents a significant challenge.

This is where Amazon Q Business comes in. It’s a powerful service designed to streamline the process of analyzing vast amounts of scientific literature and provide valuable insights efficiently. By using Amazon Q Business, researchers can more effectively harness the wealth of information available in PMC. This tool has the potential to significantly accelerate scientific progress, enhance evidence-based practice, and drive innovation in the biomedical and life sciences fields by making large-scale data analysis more accessible and manageable.

In addition to services like Amazon Q Business, researchers can also benefit from resources like the RODA, which includes the PMC dataset. RODA is a valuable initiative that provides researchers, developers, and data enthusiasts with access to a vast collection of open and publicly available datasets. This program allows users to seamlessly discover, access, and analyze a wide range of data from various domains, including healthcare, Earth observation, genomics, climate, and more. By using AWS Cloud infrastructure, users can leverage scalable and cost-effective storage and computing resources to process and analyze these datasets. RODA not only fosters collaboration and knowledge sharing but also accelerates scientific discoveries and drives innovation across different industries. With its user-friendly interface and integration with various AWS services, the program simplifies the process of working with large and complex datasets, empowering users to extract valuable insights and achieve data-driven decision-making.

Why Amazon Q Business?

One of the key benefits of using Amazon Q Business is the ability to unlock valuable insights from diverse data types and sources. Amazon Q Business provides a unified platform to ingest, process, and analyze this data, leveraging generative artificial intelligence (AI) capabilities. This enables customers to gain a comprehensive understanding of their data, ultimately driving better decision-making. By utilizing Amazon Q Business and its generative AI features, researchers can automate the extraction of relevant information, summarize findings, and identify key trends and patterns within datasets. This makes it an ideal solution for analyzing PubMed research articles, as the generative AI can assist in synthesizing complex medical information and generating human-like summaries.

Benefits of using Amazon Q Business for PubMed analysis

  • Automated data processing
    • One of the most significant advantages of Amazon Q Business is its ability to automate the data processing pipeline. Instead of manually reading through hundreds of articles, Amazon Q can quickly scan, interpret, and categorize vast amounts of textual data, saving valuable time and effort.
  • Advanced natural language processing (NLP) capabilities
    • Amazon Q Business utilizes state-of-the-art NLP algorithms to understand and interpret complex scientific texts. This enables it to accurately extract key information from PubMed articles, such as study objectives, methodologies, results, and conclusions.
  • Summarization and reporting
    • Amazon Q Business can generate concise summaries of research articles, highlighting the most critical points. This feature is particularly useful for creating reports or presentations, enabling stakeholders to quickly grasp essential information without having to dive deep into the full text.
  • Scalability
    • Amazon Q Business is designed to handle large datasets, making it scalable to accommodate growing volumes of research publications. As more articles are published on PubMed, Amazon Q can continue to provide timely and relevant insights.
  • User interface (UI)
    • With Amazon Q Business, researchers can engage in a conversation about the data. Amazon Q Business provides a hosted chatbot user interface. Users can ask questions, summarize key insights, and leverage the chatbot to generate content.
  • Security and governance
    • Amazon Q Business is built to be secure and private, and it can understand and respect your existing identities, roles, and permissions. If a user does not have permission to access certain data without Amazon Q Business, then they still will not have access to that data using Amazon Q Business.
    • Amazon Q Business provides administrative controls, such as the ability to block entire topics and filter both questions and finalized answers using administrator-specified keywords to keep responses aligned with an organization’s policies.

Solution overview

The following diagram illustrates a high-level architecture showing how to retrieve a copy of the data, store it in the cloud, index the data, and create a secure chatbot to interact with the data.

Figure 1. High-level architecture of Amazon Q chatbot. The solution will retrieve PMC data from the Registry of Open Data on AWS and use Amazon Q Business to index the data within your AWS account.

Prerequisites

In order to complete this walk-through on your own, you will need to (1) have an AWS account and the ability to create and modify an Amazon Simple Storage Service (Amazon S3) bucket, as well as (2) have signed up for an Amazon Q Business subscription. 

How to use Amazon Q for PubMed research

  • Step 1: Retrieve Data
    • Log into your AWS account
    • Open AWS CloudShell to access a browser-based command line to run commands within your account

Figure 2. The location of the AWS CloudShell icon within the AWS console is to the immediate right of the search bar.

aws s3 mb s3://{{my-unique-bucket-name}} --region us-east-1

Figure 3. AWS CLI command to create an Amazon S3 bucket.

    • Use the AWS CLI “s3 sync” command to synchronize the data from the Registry of Open Data on AWS into the new S3 bucket.
      • Here we are copying files from the “pmc-oa-opendata” S3 bucket and specifically objects with the path “/oa_comm” into your newly created S3 bucket in your account.

aws s3 sync s3://pmc-oa-opendata/oa_comm s3://{{my-unique-bucket-name}}

Figure 4. AWS CLI command to execute in AWS CloudShell that synchronizes the data from the source to your Amazon S3 bucket.

The above command copies data from PMC S3 bucket to local S3 bucket.

Figure 5. In-progress ‘sync’ of files from source to destination.

  • Step 2: Create Amazon Q Business application
    • In the search bar, navigate to Amazon Q Business
    • Select “Get Started” and “Create Application”

Figure 6. Amazon Q Business Create application page.

  • Enter Application name “PMC-Chatbot” and select “Create”
  • Configure the following settings
    • Retriever: select “Use Native Retriever”
    • Index Provisioning: select “Enterprise”
    • Number of units: 30
  • Configure a data source

You can incorporate one or many data sources for the Amazon Q Business native retriever to index. There are a multitude of pre-built connectors options for both cloud- and on-premises-based data sources. Select the “+” next to the “Amazon S3” connector.

Figure 7. Amazon Q Business data source connector options.

  • Name the data source: “PMCdata”
  • In the “IAM role” section, select “Create a new service role (Recommended)”
  • In “Sync scope”: select the Amazon S3 bucket created earlier
  • For the Sync Mode choose “New, modified, or deleted content sync” and a “Monthly” sync run schedule
  • Select Add data source
  • Configure user access
    • Select ‘Add groups and users’ drop down and assign users to the application and select an Amazon Q Business subscription
    • The new user will receive an email invitation to the Amazon Q Chatbot and set up a password

Figure 8. Amazon Q Business user creation screen: create the user and assign a subscription type for access to the chatbot.

  • Step 3: Index the data

At this point you have downloaded the data to your AWS account, created an Amazon Q Business application, created and assigned a user with access to the chatbot, and created the data source. We will now need to run the data source sync so that the chatbot can incorporate the PMC data into the responses.

From the Amazon Q Business application, select your application and the data source you created.

  • Select “Sync Now” to begin the indexing process.

Figure 9. Amazon Q Business data source screen. The application can have multiple data sources with various sync schedules. Administrators will be able to see the data sync history and details on success/failures in the data source sync.

  • Step 4: Interact with the chatbot
    • Once indexing has completed, you can explore the chatbot by navigating to the “Web Experience Settings” and selecting the “Deployed URL”
      • Optionally, you can customize your chatbot with the “Customer Web Experience”

Figure 10. Amazon Q chatbot UI screen with customized user interface.

    • The new chatbot is now able to respond based on the indexed PMC data. Additionally, the chatbot can provide reference details on the source documents used to generate each response.

Figure 11. Amazon Q chatbot UI screen with response generated from PMC data. Users will be able to find the data source references within the response.

Real-world applications

Amazon Q Business offers significant advantages to a wide range of professionals in the healthcare and life sciences sectors. Academics, researchers, healthcare providers, and pharmaceutical companies can all benefit from its capabilities in various, often overlapping ways. For instance, the service enables users to perform comprehensive literature reviews, identify research gaps, and stay informed about the latest discoveries in their respective fields. This is particularly valuable for academic researchers, but also for healthcare professionals and pharmaceutical companies seeking to stay at the forefront of medical advancements.

Healthcare providers can leverage Amazon Q Business to keep abreast of the latest medical research, enabling them to provide evidence-based care and improve patient outcomes. Similarly, pharmaceutical companies can use the service to stay updated on recent developments in drug research, clinical trials, and medical treatments. This shared knowledge base allows for better-informed decision-making across the board, from patient care to drug development strategies.

The service’s ability to quickly identify promising studies and synthesize large volumes of information is beneficial for all users. It can accelerate R&D efforts in pharmaceutical companies, inform treatment decisions for healthcare providers, and guide research directions for academics. By providing efficient access to a wealth of scientific literature, Amazon Q Business fosters a more connected and informed approach to healthcare and life sciences, ultimately contributing to advancements in patient care, drug development, and medical research. 

Conclusion

In this blog post, we demonstrate how to leverage Amazon Q Business to quickly create a generative AI application using PubMed data in the Registry of Open Data on AWS. By harnessing the advanced capabilities of Amazon Q Business, we are able to index PubMed data and use an Amazon Q Chatbot for an interactive experience to extract insights from the literature.

Amazon Q Business is revolutionizing the way organizations extract insights from data sources such as PubMed research articles. By automating data processing and leveraging advanced NLP capabilities, Amazon Q enables users to efficiently analyze vast amounts of scientific literature and uncover valuable insights. Whether you are in academia, healthcare, or the pharmaceutical industry, Amazon Q Business can help you stay ahead in the continuously evolving world of biomedical research. Embrace the power of Amazon Q Business and transform your approach to research analysis today.

Learn more

Read more about Amazon Q