Posted On: Aug 9, 2022
Amazon SageMaker Automatic Model Tuning now supports specifying multiple alternate SageMaker training instance types to make tuning jobs more robust when the preferred instance type is not available due to insufficient capacity.
SageMaker Automatic Model Tuning finds the best version of a model by running many training jobs on the dataset using the specific ranges of hyperparameters that you provide for your algorithm. It then chooses the optimal hyperparameter values that result in a model that performs the best, as measured by a metric that you choose.
Previously, when creating SageMaker Automatic Model Tuning jobs, you were able to define only one SageMaker training instance type. If the capacity for this instance type was low, you would face increased job runtime and high chances of tuning job failures. This was particularly undesirable as hyperparameter tuning involves running multiple and potentially long-running training jobs, which would have to be re-started from scratch in the event of such failures. With this launch, you can now specify up to 5 additional alternate instance types in the order of your preference so that the hyperparameter tuning job can automatically fall back to the next alternate instance type in the event of insufficient capacity. This makes tuning jobs resilient to insufficient capacity scenarios and allows you to tune your models without any runtime increase or failure due to the low availability of some specific SageMaker training instances.
The ability to specify multiple alternate instance types in SageMaker Automatic Model Tuning is now available in all commercial AWS Regions. To learn more, please read the API reference guide and the technical documentation.