AWS Partner Network (APN) Blog
Category: AWS Inferentia
Reducing Inference Times by 87% for Darwinbox’s Talent Search Engine Using AWS Inferentia
Darwinbox wanted to reduce the time to infer resumes against job descriptions using PyTorch models. AWS Premier Partner Minfy helped them leverage Amazon SageMaker and AWS Inferentia to compile models with Neuron SDK and deploy them, achieving 87% faster inference without retraining. Key steps were compiling models with the Neuron SDK, extending SageMaker containers, using Inference Recommender to optimize configurations, and sending requests in mini-batches.