AWS Machine Learning Blog

Category: Amazon SageMaker Studio

Align Meta Llama 3 to human preferences with DPO, Amazon SageMaker Studio, and Amazon SageMaker Ground Truth

Align Meta Llama 3 to human preferences with DPO, Amazon SageMaker Studio, and Amazon SageMaker Ground Truth

In this post, we show you how to enhance the performance of Meta Llama 3 8B Instruct by fine-tuning it using direct preference optimization (DPO) on data collected with SageMaker Ground Truth.

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

Use LangChain with PySpark to process documents at massive scale with Amazon SageMaker Studio and Amazon EMR Serverless

In this post, we explore how to build a scalable and efficient Retrieval Augmented Generation (RAG) system using the new EMR Serverless integration, Spark’s distributed processing, and an Amazon OpenSearch Service vector database powered by the LangChain orchestration framework. This solution enables you to process massive volumes of textual data, generate relevant embeddings, and store them in a powerful vector database for seamless retrieval and generation.

LLM evaluation and selection journey

LLM experimentation at scale using Amazon SageMaker Pipelines and MLflow

Large language models (LLMs) have achieved remarkable success in various natural language processing (NLP) tasks, but they may not always generalize well to specific domains or tasks. You may need to customize an LLM to adapt to your unique use case, improving its performance on your specific dataset or task. You can customize the model […]

Geospatial notebook

Create custom images for geospatial analysis with Amazon SageMaker Distribution in Amazon SageMaker Studio

This post shows you how to extend Amazon SageMaker Distribution with additional dependencies to create a custom container image tailored for geospatial analysis. Although the example in this post focuses on geospatial data science, the methodology presented can be applied to any kind of custom image based on SageMaker Distribution.

Indian language RAG with Cohere multilingual embeddings and Anthropic Claude 3 on Amazon Bedrock

Media and entertainment companies serve multilingual audiences with a wide range of content catering to diverse audience segments. These enterprises have access to massive amounts of data collected over their many years of operations. Much of this data is unstructured text and images. Conventional approaches to analyzing unstructured data for generating new content rely on […]

Amazon SageMaker now integrates with Amazon DataZone to streamline machine learning governance

Unlock ML governance with SageMaker-DataZone integration: streamline infrastructure, collaborate, and govern data/ML assets.

Accelerate ML workflows with Amazon SageMaker Studio Local Mode and Docker support

We are excited to announce two new capabilities in Amazon SageMaker Studio that will accelerate iterative development for machine learning (ML) practitioners: Local Mode and Docker support. ML model development often involves slow iteration cycles as developers switch between coding, training, and deployment. Each step requires waiting for remote compute resources to start up, which […]

Explore data with ease: Use SQL and Text-to-SQL in Amazon SageMaker Studio JupyterLab notebooks

Amazon SageMaker Studio provides a fully managed solution for data scientists to interactively build, train, and deploy machine learning (ML) models. In the process of working on their ML tasks, data scientists typically start their workflow by discovering relevant data sources and connecting to them. They then use SQL to explore, analyze, visualize, and integrate […]