Transfer Learning A Step by Step Easy Guide

Transfer Learning Basic Concept and the Building Blocks

Considering the lengthening timelines for deep machine learning and AI projects to fight against COVID-19 the interest in transfer learning has grown significantly. Transfer learning for deep machine learning is the process of first training a base network on a benchmark dataset (like ImageNet), and then transferring the best-learned network features (the network’s weights and structures) to a second network to be trained on a target dataset. This idea has been shown to improve deep neural network’s generalization capabilities significantly in many application areas.

Transfer learning is currently used in almost every deep learning model when the target dataset does not contain enough labeled data. Building deep learning models from scratch and training with huge data is very expensive, both in time and resources. Transfer learning is very effective for rapid prototyping, resource efficiency and high performance. As human brain carry forward knowledge and wisdom and learn it from others, transfer learning mimic this type behavior.   

Transfer Learning Base Models

To design an efficient neural network model, you need to know the details of different base models. Because from the base model you will be transferring the knowledge to your new model. Here, knowledge means the network structures and the weights. We broadly classified the base models n two groups. The first group is primarily based on the success of CNN clone models for computer vision and classification problems. The second group is for sequential transfer learning based on the success of the natural language processing tasks.

Transfer Learning Base Models

Figure 1: Transfer Learning Base Models

Popularly, ImageNet pre-trained standard deep CNN models are used as baseline for transfer learning. ImageNet offers a very comprehensive database of more than 1.2 million categorized natural images. There are more than 1000 training images per class and there are 1000+ classes. The CNN models trained upon this database serve as the backbone of transfer learning. Using ImageNet pre-trained CNN features, impressive results have been obtained on several classification datasets.

The early designs such as LeNet-5, AlexNet, VGGNet, GoogLeNet, and ResNet are fundamentals of transfer learning. For computer vision and classification problems, you can leverage some popular base models such as, ResNet-50, MobileNet, NASNet-A, VGG-16, VGG-19, Inception V3, EfficientNet-B7, and Xception.

For sequential transfer learning and natural language processing tasks you can leverage some popular base models such as, ULMFiT, Word2Vec, GloVe, FastText, Google’s BERT, Transformer, and ELMo.

Transfer Learning in 12 Steps

The twelve key steps for transfer learning are as follows: 

  1. Import required libraries
  2. Load appropriate dataset
  3. Split the data in three sets: Training, Validation, and Testing
  4. One-hot Encoding the labels
  5. Data Augmentation
  6. Create instance for the Base Model
  7. Build the New Model by defining and adding layers
  8. Define the parameters and compile the new model
  9. Train the new model with data
  10. Plot graphs like training accuracy and validation accuracy etc. 
  11. Make predictions
  12. Plot the confusion matrices.
Transfer Learning in 12 Steps

Figure 2: Transfer Learning Step by Step Guide

Here, step 7, building the new model,  is the most critical one. The details are explained below.  

Model Building Strategies in Transfer Learning

1 Feature extraction: Here we freeze the weights of all feature extraction layers and remove the layers closer to the outputs. In CNN model the initial layers can be treated as an arbitrary general feature extractor. Hence, in feature extraction, the model weights are frozen and the output from it is directly sent to another model.

Either the features can be sent to a fully connected model or the features are used as inputs for the traditional machine learning algorithms: random forest (RF), support vector machine (SVM), k-nearest neighbors (kNN), decision tree (DT), or naive Bayes (NB). The benefit of using this is the task-specific model can be used again for similar data. In addition, if the same data is used repeatedly, extracting feature once can save many computing resources.

2. Fine-tuning: Here we freeze only few layers and then fine tune remaining layers using another dataset. In fine-tuning, as the name implies, the weights are kept trainable and are fine-tuned for the target task. Thus, the pre-trained model act as a starting point for the model leading to faster convergence compared to the random initialization.

In the fine-tuning strategy all weights are changed when training on the new task, whereas in the feature extraction strategy only the weights of the newly added last layers change during the training phase. However, depending on the problems,  you need to balance and you may need to use both.

Steps for Fine-tuning the New Model

1) Build the new network model on top of an already trained base model.
2) Freeze the base network.
3) Train the part you added.
4) Unfreeze some layers in the base network.
5) Jointly train both these layers and the part you added.

Basic Building Blocks of CNN Based Transfer Learning Models

The figure 3, explains the  basic building blocks of the transfer learning. 

Transfer Learning Basic Concept  and the Building Blocks

Figure 3: Transfer Learning Basic Concept and the Building Blocks

Convolution, ReLU and Pooling Blocks

As shown the above figure, in this design, the CONV blocks includes the combination of convolution, ReLU and pooling blocks. These plays the role of feature extraction, which can be considered equivalently to feature extraction in the traditional models. Each CONV layer has a number of convolutional kernels. 

Pooling Layers

The man objective of the pooling layer is to achieve a feature selection process and to reduce the data dimensions while conserving the main characteristics of the data. Maximum pooling, mean pooling and randomly pooling are generally used methods, which extract the points with the largest value, mean value and random values in the local or global domain. 

Fully connected layers

The fully connected (FC) layer is generally the last layer in the structure of CNN. The fully connected layer can integrate local information, which has the ability of discriminating classes. Output layer is also called the softmax layer, which maps the output of the fully connected layer to (0, 1) using softmax function. 


In this guide, we explained the twelve steps of transfer learning, twelve classes of base models, and the basic building blocks of the transfer learning. The main objective of transfer learning is to implement a model quickly. To solve the current problem, instead of creating comprehensive deep learning network  from scratch. The key benefits of transfer learning is rapid prototyping, resource efficiency, data efficiency and high performance.