Use Amazon Bedrock tooling with Amazon SageMaker JumpStart models

Today, we’re excited to announce a new capability that allows you to deploy over 100 open-weight and proprietary models from Amazon SageMaker JumpStart and register them with Amazon Bedrock, allowing you to seamlessly access them through the powerful Amazon Bedrock APIs. You can now use Amazon Bedrock features such as Amazon Bedrock Knowledge Bases and Amazon Bedrock Guardrails with models deployed through SageMaker JumpStart.

SageMaker JumpStart helps you get started with machine learning (ML) by providing fully customizable solutions and one-click deployment and fine-tuning of more than 400 popular open-weight and proprietary generative AI models. Amazon Bedrock is a fully managed service that provides a single API to access and use various high-performing foundation models (FMs). It also offers a broad set of capabilities to build generative AI applications. The Amazon Bedrock Converse API is a runtime API that provides a consistent interface that works with different models. It allows you to use advanced features in Amazon Bedrock such as the playground, guardrails, and tool use (function calling).

SageMaker JumpStart has long been the go-to service for developers and data scientists seeking to deploy state-of-the-art generative AI models. Through this integration, you can now combine the flexibility of hosting models on SageMaker JumpStart with the fully managed experience of Amazon Bedrock, including advanced security controls, scalable infrastructure, and comprehensive monitoring capabilities.

In this post, we show you how to deploy FMs through SageMaker JumpStart, register them with Amazon Bedrock, and invoke them using Amazon Bedrock APIs.

Solution overview

The Converse API standardizes interaction with Amazon Bedrock FMs, enabling developers to write code one time and use it across various models without needing to adjust for model-specific differences. It supports multi-turn conversations through conversational history as part of the API request, and developers can perform tasks that require access to external APIs through the usage of tools (function calling). Additionally, the Converse API allows you to block inappropriate inputs or generated content by including a guardrail in your API calls. To review the complete list of supported models and model features, refer to Supported models and model features.

This new feature extends the capabilities of the Converse API into a single interface that developers can use to call FMs deployed in SageMaker JumpStart. This allows developers to use the same API to invoke models from Amazon Bedrock and SageMaker JumpStart, streamlining the process to integrate models into their generative AI applications. Now you can build on top of an even larger library of world-class open source and proprietary models through a single API. To view the full list of Bedrock Ready models available from SageMaker JumpStart, refer to the Bedrock Marketplace documentation. You can also use Amazon Bedrock Marketplace to discover and deploy these models to SageMaker endpoints.

In this post, we walk through the following steps:

Deploy the Gemma 2 9B Instruct model using SageMaker JumpStart.
Register the model with Amazon Bedrock.
Test the model with sample prompts using the Amazon Bedrock playground.
Use the Amazon Bedrock RetrieveAndGenerate API to query the Amazon Bedrock knowledge base.
Set up Amazon Bedrock Guardrails to help block harmful content and personally identifiable information (PII) data.
Invoke models with Converse APIs to show an end-to-end Retrieval Augmented Generation (RAG) pipeline.

Prerequisites

You can access and deploy pretrained models from SageMaker JumpStart in the Amazon SageMaker Studio UI. To access SageMaker Studio on the AWS Management Console, you need to set up an Amazon SageMaker domain. SageMaker uses domains to organize user profiles, applications, and their associated resources. To create a domain and set up a user profile, refer to Guide to getting set up with Amazon SageMaker.

You also need an AWS Identity and Access Management (IAM) role with appropriate permissions. To get started with this example, you can use the AmazonSageMakerFullAccess, AmazonBedrockFullAccess, AmazonOpenSearchAccess managed policies to provide the required permissions to SageMaker JumpStart and Amazon Bedrock. For a more scoped down set of permissions, refer to the following:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "BedrockEndpointManagementMutatingOperations",
      "Action": [
        "sagemaker:AddTags",
        "sagemaker:CreateEndpoint",
        "sagemaker:CreateEndpointConfig",
        "sagemaker:CreateModel",
        "sagemaker:DeleteEndpoint",
        "sagemaker:UpdateEndpoint",
        "sagemaker:DeleteTags"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:sagemaker:*",
      "Condition": {
        "StringEquals": {
            "aws:ViaAWSService": "bedrock.amazonaws.com"
        }
       }
    },
    {
      "Sid": "BedrockEndpointManagementNonMutatingOperations",
      "Action": [
        "sagemaker:DescribeEndpoint",
        "sagemaker:DescribeEndpointConfig",
        "sagemaker:DescribeModel",
        "sagemaker:ListEndpoints",
        "sagemaker:ListTags"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:sagemaker:*",
      "Condition": {
        "StringEquals": {
            "aws:ViaAWSService": "bedrock.amazonaws.com"
        }
       }
    },
    {
      "Sid": "BedrockEndpointInvokingOperations",
      "Action": [
        "sagemaker:InvokeEndpoint",
        "sagemaker:InvokeEndpointWithResponseStream"      
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:sagemaker:*",
      "Condition": {
        "StringEquals": {
            "aws:ViaAWSService": "bedrock.amazonaws.com"
         }
       }
    },
    {
      "Sid": "AllowDiscoveringPublicModelDetails",
      "Action": [
        "sagemaker:DescribeHubContent"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:sagemaker:*:aws:hub-content/SageMakerPublicHub/Model/*"
    },
    {
      "Sid": "AllowListingPublicModels",
      "Action": [
        "sagemaker:ListHubContents"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:sagemaker:*:aws:hub/SageMakerPublicHub"
    },
    {
      "Sid": "RetrieveSubscribedMarketplaceLicenses",
      "Action": [
        "license-manager:ListReceivedLicenses"
      ],
      "Effect": "Allow",
      "Resource": "arn:aws:license-manager:*"
    },
    {
      "Sid" : "PassRoleToSagemaker",
      "Effect" : "Allow",
      "Action" : [
        "iam:PassRole"
      ],
      "Resource" : "arn:aws:iam::*:role/*AmazonSageMaker*",
      "Condition" : {
        "StringEquals" : {
        "iam:PassedToService" : [
            "sagemaker.amazonaws.com"
          ]
        }
      }
    },
    {
      "Sid" : "BedrockAll",
      "Effect" : "Allow",
      "Action" : [ "bedrock:*" ],
      "Resource" : "*" 
    },
    {
      "Sid" : "AmazonOpenSearchAccess",
      "Effect" : "Allow",
      "Action" : [ "aoss:*" ],
      "Resource" : "*",
      "Condition": {
                "StringEquals": {
                    "aws:ResourceAccount": "${aws:PrincipalAccount}"
                }
      }
    },
  ]
}

After applying the relevant permissions, setting up a SageMaker domain, and creating user profiles, you are ready to deploy your SageMaker JumpStart model and register it with Amazon Bedrock.

Deploy a model with SageMaker JumpStart and register it with Amazon Bedrock

This section provides a walkthrough of deploying a model using SageMaker JumpStart and registering it with Amazon Bedrock. In this walkthrough, you will deploy and register the Gemma 2 9B Instruct model offered through Hugging Face in SageMaker JumpStart. Complete the following steps:

On the SageMaker console, choose Studio in the navigation pane.
Choose the relevant user profile on the dropdown menu and choose Open Studio.

In SageMaker Studio, choose JumpStart in the navigation pane.

Here, you will see a list of the available SageMaker JumpStart models. Models that can be registered to Amazon Bedrock after they’ve been deployed through SageMaker JumpStart have a Bedrock ready tag.

The Gemma 2 9B Instruct model for this example is provided by Hugging Face, so choose the Hugging Face model card.

To filter the list of models to view which models are supported by Amazon Bedrock, select Bedrock Ready under Action.
Search for Gemma 2 9B Instruct and choose the model card for Gemma 2 9B Instruct.

You can review the model card for Gemma 2 9B Instruct to learn more about the model.

To deploy the model, choose Deploy.
Review the End User License Agreement for Gemma 2 9B Instruct and select I accept the End User License Agreement (EULA) and read the terms and conditions.
Leave the endpoint settings with their default values and choose Deploy.

The endpoint deployment process will take a few minutes.

Under Deployments in the navigation pane, choose Endpoints to view your available endpoints.

After a few minutes, the model will be deployed to the endpoint and its status will change to In service, indicating that the endpoint is ready to serve traffic. You can use the Refresh icon at the bottom of the endpoint screen to get the latest information.

When your endpoint is in service, choose it to go to the endpoint details page.

Choose Use with Bedrock to start the registration process.

You will be redirected to the Amazon Bedrock console.

On the Register endpoint page, the SageMaker endpoint Amazon Resource Name (ARN) and model ARN have already been prepopulated. Review these values and choose Register.

Your SageMaker endpoint will be registered with Amazon Bedrock in a few minutes.

After your SageMaker endpoint is registered with Amazon Bedrock, you can invoke it using the Converse API. Then you can test your endpoint in the Amazon Bedrock playground.

In the navigation pane on the Amazon Bedrock console, choose Marketplace deployments under Foundation models.
From the list of managed deployments, select your registered model, then choose Open in playground.

You will now be in the Amazon Bedrock playground for Chat/text. The Chat/text playground allows to you test your model with a single prompt, or provides chat capability for conversational use cases. Because this example will be an interactive chat session, leave the Mode as the default Chat. The chat capability in the playground should be set to test your Gemma 2 9B Instruct model.

Now you can test your SageMaker endpoint through Amazon Bedrock! Use the following prompt to test summarizing a meeting transcript, and review the results:

Meeting transcript:
Miguel: Hi Brant, I want to discuss the workstream for our new product launch
Brant: Sure Miguel, is there anything in particular you want to discuss?
Miguel: Yes, I want to talk about how users enter into the product.
Brant: Ok, in that case let me add in Namita.
Namita: Hey everyone
Brant: Hi Namita, Miguel wants to discuss how users enter into the product.
Miguel: its too complicated and we should remove friction.  for example, why do I need to fill out additional forms?  I also find it difficult to find where to access the product when I first land on the landing page.
Brant: I would also add that I think there are too many steps.
Namita: Ok, I can work on the landing page to make the product more discoverable but brant can you work on the additonal forms?
Brant: Yes but I would need to work with James from another team as he needs to unblock the sign up workflow.  Miguel can you document any other concerns so that I can discuss with James only once?
Miguel: Sure.

From the meeting transcript above, Create a list of action items for each person.

Enter the prompt into the playground, then choose Run.

You can view the response in the chat generated by your deployed SageMaker JumpStart model through Amazon Bedrock:

Here's a breakdown of action items from the meeting transcript:

**Miguel:**

* **Document:** List out any additional concerns regarding user entry into the product. Share these with Brant for his discussion with James.

**Brant:**

* **Collaborate with James:**  Work with James from another team to simplify the additional forms involved in the user sign-up workflow.
* **Review Documentation:** Review Miguel's documented concerns about user entry to prepare for the discussion with James.

**Namita:**

* **Landing Page Redesign:**  Improve the landing page to make the product more discoverable for new users.

Let me know if you'd like me to elaborate on any of these action items!

You can also test the model with your own prompts and use cases.

Use Amazon Bedrock APIs with the deployed model

This section demonstrates using the AWS SDK for Python (Boto3) and Converse APIs to invoke the Gemma 2 9B Instruct model you deployed earlier through SageMaker and registered with Amazon Bedrock. The full source code associated with this post is available in the accompanying GitHub repo. For additional Converse API examples, refer to Converse API examples.

In this code sample, we also implement a RAG architecture in conjunction with the deployed model. RAG is the process of optimizing the output of a large language model (LLM) so it references an authoritative knowledge base outside of its training data sources before generating a response.

Use the deployed SageMaker model with the RetrieveAndGenerate API offered by Amazon Bedrock to query a knowledge base and generate responses based on the retrieved results. The response only cites sources that are relevant to the query. For information on creating a Knowledge Base, refer to Creating a Knowledge Base. For additional code samples, refer to RetrieveAndGenerate.

The following diagram illustrates the RAG workflow.

Complete the following steps:

To invoke the deployed model, you need to pass the endpoint ARN of the deployed model in the modelId parameter of the Converse API.

To obtain the ARN of the deployed model, navigate to the Amazon Bedrock console. In the navigation pane, choose Marketplace deployments under Foundation models. From the list of managed deployments, choose your registered model to view more details.

You will be directed to the model summary on the Model catalog page under Foundation models. Here, you will find the details associated with your deployed model. Copy the model ARN to use in the following code sample.

import boto3

bedrock_runtime = boto3.client("bedrock-runtime")

# Add your bedrock endpoint arn here.
endpoint_arn = "arn:aws:sagemaker:<AWS::REGION>:<AWS::AccountId>:endpoint/<Endpoint_Name>"

# Base inference parameters to use.
inference_config = {
        "maxTokens": 256,
        "temperature": 0.1,
        "topP": 0.999,
}

# Additional inference parameters to use.
additional_model_fields = {"parameters": {"repetition_penalty": 0.9, "top_k": 250, "do_sample": True}}


response = bedrock_runtime.converse(
    modelId=endpoint_arn,
    messages=[
        {
            "role": "user",
            "content": [
                {
                    "text": "What is Amazon doing in the field of generative AI?",
                },
            ]
        },
    ],
    inferenceConfig=inference_config,
    additionalModelRequestFields=additional_model_fields,
)

Invoke the SageMaker JumpStart model with the RetrieveAndGenerate API. The generation_template and orchestration_template parameters in the retrieve_and_generate API are model specific. These templates define the prompts and instructions for the language model, guiding the generation process and the integration with the knowledge retrieval component.

import boto3

bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime")

# Provide your Knowledge Base Id 
kb_id = "" 

response = bedrock_agent_runtime_client.retrieve_and_generate(
    input={
        "text": "What is Amazon doing in the field of generative AI?"
    },
    retrieveAndGenerateConfiguration={
        "type": "KNOWLEDGE_BASE",
        "knowledgeBaseConfiguration": {
            "generationConfiguration": {
                "inferenceConfig": {
                    "textInferenceConfig": {
                        "maxTokens": 512,
                        "temperature": 0.1,
                        "topP": 0.9
                    }
                },
                "promptTemplate": {
                    "textPromptTemplate": generation_template
                }
            },
            "knowledgeBaseId": kb_id,
            "orchestrationConfiguration": {
                "inferenceConfig": {
                    "textInferenceConfig": {
                        "maxTokens": 512,
                        "temperature": 0.1,
                        "topP": 0.9
                    }
                },
                "promptTemplate": {
                    "textPromptTemplate": orchestration_template
                },
            },
            "modelArn": endpoint_arn,
            "retrievalConfiguration": {
                "vectorSearchConfiguration": {
                    "numberOfResults":5
                } 
            }
        }
    }
)

Now you can implement guardrails with the Converse API for your SageMaker JumpStart model. Amazon Bedrock Guardrails enables you to implement safeguards for your generative AI applications based on your use cases and responsible AI policies. For information on creating guardrails, refer to Create a Guardrail. For additional code samples to implement guardrails, refer to Include a guardrail with Converse API.

In the following code sample, you include a guardrail in a Converse API request invoking a SageMaker JumpStart model:

import boto3

bedrock_agent_runtime_client = boto3.client("bedrock-agent-runtime")

# Provide your Knowledge Base Id
kb_id = "" 

relevant_documents = bedrock_agent_runtime_client.retrieve(
    retrievalQuery= {
        "text": "What is Amazon doing in the field of generative AI?"
    },
    knowledgeBaseId=kb_id,
    retrievalConfiguration= {
        "vectorSearchConfiguration": {
            "numberOfResults": 1
        }
    }
)

def invoke_model(prompt, source, inference_config=None, additional_model_field=None):
    messages = [
        {
            "role": "user",
            "content": [
                {
                    "guardContent": {
                        "text": {
                            "text": source,
                            "qualifiers": ["grounding_source"],
                        }
                    }
                },
                {
                    "guardContent": {
                        "text": {
                            "text": prompt,
                            "qualifiers": ["query"],
                        }
                    }
                },
            ],
        }
    ]
    if not inference_config:
        # Base inference parameters to use.
        inference_config = {
                "maxTokens": 256,
                "temperature": 0.1,
                "topP": 0.999,
        }
    
    if not additional_model_field:
        # Additional inference parameters to use.
        additional_model_fields = {"parameters": {"repetition_penalty": 0.9, "top_k": 250, "do_sample": True}}


    response = bedrock_runtime.converse(
        modelId=endpoint_arn,
        messages=messages,
        inferenceConfig=inference_config,
        additionalModelRequestFields=additional_model_fields,
        guardrailConfig={
            'guardrailIdentifier': guardrail_identifier,
            'guardrailVersion': guardrail_version
        },
    )
    
    return response["output"]["message"]["content"][0]["text"]

invoke_model(prompt="What is Amazon doing in the field of generative AI?", source=relevant_documents["retrievalResults"][0]["content"]["text"]) 
# Content is Blocked 
invoke_model(prompt="Should I buy bitcoin?", source=relevant_documents["retrievalResults"][0]["content"]["text"])

Clean up

To clean up your resources, use the following code:

import boto3

from knowledge_base import KnowledgeBasesForAmazonBedrock

kb = KnowledgeBasesForAmazonBedrock()
kb.delete_kb(knowledge_base_name, delete_s3_bucket=True, delete_iam_roles_and_policies=True)

bedrock.delete_guardrail(guardrailIdentifier=guardrail_identifier)

The SageMaker JumpStart model you deployed will incur cost if you leave it running. Delete the endpoint if you want to stop incurring charges. Deleting the endpoint will also de-register the model from Amazon Bedrock. For more details, see Delete Endpoints and Resources.

Conclusion

In this post, you learned how to deploy FMs through SageMaker JumpStart, register them with Amazon Bedrock, and invoke them using Amazon Bedrock APIs. With this new capability, organizations can access leading proprietary and open-weight models using a single API, reducing the complexity of building generative AI applications with a variety of models. This integration between SageMaker JumpStart and Amazon Bedrock is generally available in all AWS Regions where Amazon Bedrock is available. Try this code to use ConverseAPIs, Knowledge bases and Guardrails with SageMaker.

About the Author

Vivek Gangasani is a Senior GenAI Specialist Solutions Architect at AWS. He helps emerging GenAI companies build innovative solutions using AWS services and accelerated compute. Currently, he is focused on developing strategies for fine-tuning and optimizing the inference performance of Large Language Models. In his free time, Vivek enjoys hiking, watching movies and trying different cuisines.

Abhishek Doppalapudi is a Solutions Architect at Amazon Web Services (AWS), where he assists startups in building and scaling their products using AWS services. Currently, he is focused on helping AWS customers adopt Generative AI solutions. In his free time, Abhishek enjoys playing soccer, watching Premier League matches, and reading.

June Won is a product manager with Amazon SageMaker JumpStart. He focuses on making foundation models easily discoverable and usable to help customers build generative AI applications. His experience at Amazon also includes mobile shopping applications and last mile delivery.

Eashan Kaushik is an Associate Solutions Architect at Amazon Web Services. He is driven by creating cutting-edge generative AI solutions while prioritizing a customer-centric approach to his work. Before this role, he obtained an MS in Computer Science from NYU Tandon School of Engineering. Outside of work, he enjoys sports, lifting, and running marathons.

Giuseppe Zappia is a Principal AI/ML Specialist Solutions Architect at AWS, focused on helping large enterprises design and deploy ML solutions on AWS. He has over 20 years of experience as a full stack software engineer, and has spent the past 5 years at AWS focused on the field of machine learning.

Bhaskar Pratap is a Senior Software Engineer with the Amazon SageMaker team. He is passionate about designing and building elegant systems that bring machine learning to people’s fingertips. Additionally, he has extensive experience with building scalable cloud storage services.

AWS Machine Learning Blog