What is transfer learning?
Transfer learning (TL) is a machine learning (ML) technique where a model pre-trained on one task is fine-tuned for a new, related task. Training a new ML model is a time-consuming and intensive process that requires a large amount of data, computing power, and several iterations before it is ready for production. Instead, organizations use TL to retrain existing models on related tasks with new data. For example, if a machine learning model can identify images of dogs, it can be trained to identify cats using a smaller image set that highlights the feature differences between dogs and cats.
What are the benefits of transfer learning?
TL offers several of the following benefits to researchers creating ML applications.
Enhanced efficiency
Training ML models takes time as they build knowledge and identify patterns. It also requires a large data set and is computationally expensive. In TL, a pre-trained model retains fundamental knowledge of tasks, features, weights, and functions, allowing it to adapt to new tasks faster. You can use a much smaller dataset and fewer resources while achieving better results.
Increased accessibility
Building deep-learning neural networks requires large data volumes, resources, computing power, and time. TL overcomes these barriers to creation, allowing organizations to adopt ML for custom use cases. You can adapt existing models to your requirements at a fraction of the cost. For example, using a pre-trained image recognition model, you can create models for medical imaging analysis, environmental monitoring, or facial recognition with minimal adjustments.
Improved performance
Models developed through TL often demonstrate greater robustness in diverse and challenging environments. They better handle real-world variability and noise, having been exposed to a wide range of scenarios in their initial training. They give better results and adapt to unpredictable conditions more flexibly.
What are the different transfer learning strategies?
The strategy you use to facilitate TL will depend on the domain of the model you are building, the task it needs to complete, and the availability of training data.
Transductive transfer learning
Transductive transfer learning involves transferring knowledge from a specific source domain to a different but related target domain, with the primary focus being on the target domain. It is especially useful when there is little or no labeled data from the target domain.
Transductive transfer learning asks the model to make predictions on target data by using previously-gained knowledge. As the target data is mathematically similar to the source data, the model finds patterns and performs faster.
For example, consider adapting a sentiment analysis model trained on product reviews to analyze movie reviews. The source domain (product reviews) and the target domain (movie reviews) differ in context and specifics but share similarities in structure and language use. The model quickly learns to apply its understanding of sentiment from the product domain to the movie domain.
Inductive transfer learning
Inductive transfer learning is where the source and target domains are the same, but the tasks the model must complete differ. The pre-trained model is already familiar with the source data and trains faster for new functions.
An example of inductive transfer learning is in natural language processing (NLP). Models are pre-trained on a large set of texts and then fine-tuned using inductive transfer learning to specific functions like sentiment analysis. Similarly, computer vision models like VGG are pre-trained on large image datasets and then fine-tuned to develop object detection.
Unsupervised transfer learning
Unsupervised transfer learning uses a strategy similar to inductive transfer learning to develop new abilities. However, you use this form of transfer learning when you only have unlabeled data in both the source and target domains.
The model learns the common features of unlabeled data to generalize more accurately when asked to perform a target task. This method is helpful if it is challenging or expensive to obtain labeled source data.
For example, consider the task of identifying different types of motorcycles in traffic images. Initially, the model is trained on a large set of unlabeled vehicle images. In this instance, the model independently determines the similarities and distinguishing features among different types of vehicles like cars, buses, and motorcycles. Next, the model is introduced to a small, specific set of motorcycle images. The model performance improves significantly compared to before.
What are the steps in transfer learning?
There are three main steps when fine-tuning a machine-learning model for a new task.
Select a pre-trained model
First, select a pre-trained model with prior knowledge or skills for a related task. A useful context for choosing a suitable model is to determine the source task of each model. If you understand the original tasks the model performed, you can find one that more effectively transitions to a new task.
Configure your pre-trained models
After selecting your source model, configure it to pass knowledge to a model to complete the related task. There are two main methods of doing this.
Freeze pre-trained layers
Layers are the building blocks of neural networks. Each layer consists of a set of neurons and performs specific transformations on the input data. Weights are the parameters the network uses for decision-making. Initially set to random values, weights are adjusted during the training process as the model learns from the data.
By freezing the weights of the pre-trained layers, you keep them fixed, preserving the knowledge that the deep learning model obtained from the source task.
Remove the last layer
In some use cases, you can also remove the last layers of the pre-trained model. In most ML architectures, the last layers are task-specific. Removing these final layers helps you reconfigure the model for new task requirements.
Introduce new layers
Introducing new layers on top of your pre-trained model helps you adapt to the specialized nature of the new task. The new layers adapt the model to the nuances and functions of the new requirement.
Train the model for the target domain
You train the model on target task data to develop its standard output to align with the new task. The pre-trained model likely produces different outputs from those desired. After monitoring and evaluating the model’s performance during training, you can adjust the hyperparameters or baseline neural network architecture to improve output further. Unlike weights, hyperparameters are not learned from the data. They are pre-set and play a crucial role in determining the efficiency and effectiveness of the training process. For example, you could adjust regularization parameters or the model’s learning rates to improve its ability in relation to the target task.
What are transfer learning strategies in generative AI?
Transfer learning strategies are critical for generative AI adoption in various industries. Organizations can customize existing foundation models without having to train new ones on billions of data parameters at scale. The following are some transfer learning strategies used in generative AI.
Domain adversarial training
Domain adversarial training involves training a foundation model to produce data that is indistinguishable from real data in the target domain. This technique typically employs a discriminator network, as seen in generative adversarial networks, that attempts to distinguish between true data and generated data. The generator learns to create increasingly realistic data.
For example, in image generation, a model trained on photographs might be adapted to generate artwork. The discriminator helps ensure the generated artwork is stylistically consistent with the target domain.
Teacher-student learning
Teacher-student learning involves a larger and more complex “teacher” model teaching a smaller and simpler “student” model. The student model learns to mimic the teacher model's behavior, effectively transferring knowledge. This is useful for deploying large generative models in resource-constrained environments.
For example, a large language model (LLM) could serve as a teacher to a smaller model, transferring its language generation capabilities. This would allow the smaller model to generate high-quality text with less computational overhead.
Feature disentanglement
Feature disentanglement in generative models involves separating different aspects of data, such as content and style, into distinct representations. This enables the model to manipulate these aspects independently in the transfer learning process.
For example, in a face generation task, a model might learn to disentangle facial features from artistic style. This would allow it to generate portraits in various artistic styles while maintaining the subject's likeness.
Cross-modal transfer learning
Cross-modal transfer learning involves transferring knowledge between different modalities, like text and images. Generative models can learn representations applicable across these modalities. A model trained on textual descriptions and corresponding images might learn to generate relevant images from new text descriptions, effectively transferring its understanding from text to image.
Zero-shot and few-shot learning
In zero-shot and few-shot learning, generative models are trained to perform tasks or generate data for which they have seen few or no examples of during training. This is achieved by learning rich representations that generalize well. For example, a generative model might be trained to create images of animals. Using few-shot learning, it could generate images of a rarely seen animal by understanding and combining features from other animals.
How can AWS help with your transfer learning requirements?
Amazon SageMaker JumpStart is an ML hub where you can access pre-trained models, including foundation models, to perform tasks like article summarization and image generation. You can use transfer learning to produce accurate models on your smaller datasets, with lower training costs than those involved in training the original model. For example, with SageMaker JumpStart, you can:
- Fully customize pre-trained models for your use case and with your data for faster deployment into production.
- Access pre-built solutions to solve common use cases.
- Share ML artifacts, including ML models and notebooks, within your organization.
When using the Cross-modal transfer learning approach, you can also use Amazon SageMaker Debugger to detect serious hidden problems. For example, you can examine model predictions to find mistakes, validate the robustness of your model, and consider how much of this robustness is from the inherited abilities. You can also validate input and preprocesses to the model for realistic expectations.
Get started with transfer learning on AWS by creating a free account today.