In this tutorial you will learn the essence of Transfer Learning. Python tensorflow keras are used for developing the deep learning frameworks of transfer learning.
Transfer learning for deep machine learning is the process of first training a base network on a benchmark dataset (like ImageNet), and then transferring the best-learned network features (the network’s weights and structures) to a second network to be trained on a target dataset.
This idea has been shown to improve deep neural network’s generalization capabilities significantly in many application areas. Recently, considering the lengthening timelines for deep machine learning the interest in transfer learning has grown significantly.
What is Transfer Learning?
Transfer learning is currently used in almost every deep learning model when the target dataset does not contain enough labeled data. Building deep learning models from scratch and training with huge data is very expensive, both in time and resources. Transfer learning is very effective for rapid prototyping, resource efficiency and high performance. As human brain carry forward knowledge and wisdom and learn it from others, transfer learning mimic this type behavior.
Transfer Learning Base Models
To design an efficient neural network model, you need to know the details of different base models. Because from the base model you will be transferring the knowledge to your new model. Here, knowledge means the network structures and the weights. We broadly classified the base models n two groups. The first group is primarily based on the success of CNN clone models for computer vision and classification problems. The second group is for sequential transfer learning based on the success of the natural language processing tasks.
Popularly, ImageNet pre-trained standard deep CNN models are used as baseline for transfer learning. ImageNet offers a very comprehensive database of more than 1.2 million categorized natural images. There are more than 1000 training images per class and there are 1000+ classes. The CNN models trained upon this database serve as the backbone of transfer learning. Using ImageNet pre-trained CNN features, impressive results have been obtained on several classification datasets.
The early designs such as LeNet-5, AlexNet, VGGNet, GoogLeNet, and ResNet are fundamentals of transfer learning. For computer vision and classification problems, you can leverage some popular base models such as, ResNet-50, MobileNet, NASNet-A, VGG-16, VGG-19, Inception V3, EfficientNet-B7, and Xception.
For sequential transfer learning and natural language processing tasks you can leverage some popular base models such as, ULMFiT, Word2Vec, GloVe, FastText, Google’s BERT, Transformer, and ELMo.
Transfer Learning in 12 Steps
The twelve key steps for transfer learning are as follows:
- Import required libraries
- Load appropriate dataset
- Split the data in three sets: Training, Validation, and Testing
- One-hot Encoding the labels
- Data Augmentation
- Create instance for the Base Model
- Build the New Model by defining and adding layers
- Define the parameters and compile the new model
- Train the new model with data
- Plot graphs like training accuracy and validation accuracy etc.
- Make predictions
- Plot the confusion matrices.
Here, step 7, building the new model, is the most critical one. The details are explained below.
Model Building Strategies in Transfer Learning
1 Feature extraction: Here we freeze the weights of all feature extraction layers and remove the layers closer to the outputs. In CNN model the initial layers can be treated as an arbitrary general feature extractor. Hence, in feature extraction, the model weights are frozen and the output from it is directly sent to another model.
Either the features can be sent to a fully connected model or the features are used as inputs for the traditional machine learning algorithms: random forest (RF), support vector machine (SVM), k-nearest neighbors (kNN), decision tree (DT), or naive Bayes (NB). The benefit of using this is the task-specific model can be used again for similar data. In addition, if the same data is used repeatedly, extracting feature once can save many computing resources.
2. Finetuning: Here we freeze only few layers and then fine tune remaining layers using another dataset. In finetuning, as the name implies, the weights are kept trainable and are fine tuned for the target task. Thus, the pre-trained model act as a starting point for the model leading to faster convergence compared to the random initialization.
In the fine tuning strategy all weights are changed when training on the new task, whereas in the feature extraction strategy only the weights of the newly added last layers change during the training phase. However, depending on the problems, you need to balance and you may need to use both.
Steps for Fine Tuning the New Model
1) Build the new network model on top of an already trained base model.
2) Freeze the base network.
3) Train the part you added.
4) Unfreeze some layers in the base network.
5) Jointly train both these layers and the part you added.
Basic Building Blocks of CNN Based Transfer Learning Models
The figure 3, explains the basic building blocks of the transfer learning.
Convolution, ReLU and Pooling Blocks
As shown the above figure, in this design, the CONV blocks includes the combination of convolution, ReLU and pooling blocks. These plays the role of feature extraction, which can be considered equivalently to feature extraction in the traditional models. Each CONV layer has a number of convolutional kernels.
The man objective of the pooling layer is to achieve a feature selection process and to reduce the data dimensions while conserving the main characteristics of the data. Maximum pooling, mean pooling and randomly pooling are generally used methods, which extract the points with the largest value, mean value and random values in the local or global domain.
Fully connected layers
The fully connected (FC) layer is generally the last layer in the structure of CNN. The fully connected layer can integrate local information, which has the ability of discriminating classes. Output layer is also called the softmax layer, which maps the output of the fully connected layer to (0, 1) using softmax function.
In this guide, we explained the twelve steps of transfer learning, twelve classes of base models, and the basic building blocks of the transfer learning. The main objective of transfer learning is to implement a model quickly. To solve the current problem, instead of creating comprehensive deep learning network from scratch. The key benefits of transfer learning is rapid prototyping, resource efficiency, data efficiency and high performance.