Amazon Bedrock Model Distillation

Overview

With Amazon Bedrock Model Distillation, you can use smaller, faster, more cost-effective models that deliver use-case specific accuracy that is comparable to the most advanced models in Amazon Bedrock. Distilled models in Amazon Bedrock are up to 500% faster and up to 75% less expensive than original models, with less than 2% accuracy loss for use cases like RAG.

Utilize smaller, more cost-effective models

With Model Distillation, customers can select a ‘teacher’ model whose accuracy they want to achieve for their use-case and then select a ‘student’ model that they want to fine-tune. Customers also provide prompts for their use-case. Model Distillation automates the process of generating responses from the teacher and using those responses to fine-tune the student model. Student models can then behave like teacher models with similar accuracy at reduced costs.

UI screenshot

Maximize distilled model performance with proprietary data synthesis

Fine-tuning a smaller, cost-efficient model to achieve accuracy similar to a larger model for your specific use case is an iterative process. To remove some of the burden of iteration needed to achieve better results, Model Distillation may choose to apply different data synthesis methods that are best suited for your use-case. For example, Bedrock may expand the training dataset by generating similar prompts or generate high-quality synthetic responses using customer provided prompt-response pairs as golden examples.

UI screenshot

Reduce cost by easily bringing your production data

With traditional fine-tuning, customers are required to create prompts and responses. With Model Distillation, customers only need to provide prompts, which Model Distillation then uses to generate synthetic responses and fine-tune the student models. Customers can direct us to their invocation logs and also filter out the logs based on certain metadata fields. Model distillation can read both prompts and responses via invocation logs and skip synthetic response generation in the Model Distillation workflow, which reduces cost by not having to generate responses from the teacher model again. Get started with code samples.

UI screenshot