Unveiling the power of Transfer Learning and Fine-tuning, we delve into their potential to revolutionize machine learning models. We explore how these techniques can enhance model performance, reduce training time, and make AI more accessible and efficient.
Transfer learning consists of taking features learned on one problem and leveraging them on a new, similar problem
It is usually done for tasks where your dataset has too little data to train a full-scale model from scratch
- The most common incarnation of transfer learning in the context of deep learning is the following workflow:
- Take layers from a previously trained model. Freeze them, so as to avoid destroying any of the information they contain during future training rounds.
- Add new, trainable layers on top of the frozen layers.
- Train the new layers on your dataset.
The typical transfer-learning workflow
Instantiate a base model and load pre-trained weights into it
- Freeze all layers in the base model by setting trainable = False
- Create a new model on top of the output of one (or several) layers
- Train the model on new data
- Transfer learning is typically used for tasks when your new dataset has too little data to train a full-scale model from scratch
Transfer learning & fine-tuning with a custom training loop
Create a base model, then freeze it and create a new model on top
- Open a GradientTape and forward pass
- Get gradients of loss wrt the trainable weights
- Gradients = tape.gradient(loss_value, model.trainable_weights)
- Update the weights of the model
- Optimize the optimizer
Using random data augmentation
When you don’t have a large image dataset, it’s a good practice to artificially introduce sample diversity by applying random yet realistic transformations to the training images.
- This helps expose the model to different aspects of the training data while slowing down overfitting.
Freezing layers: understanding the trainable attribute
Layers & models have three weight attributes: weights, trainable weights, non_trainable weights
- Weights are the list of all weights variables of the layer.
- Trainable weights are those that are meant to be updated via gradient descent to minimize the loss during training
- Non-trainable weight is those that aren’t trained and are updated by the model during forward passes
- All weights are trainable in general, except for BatchNormalization layer which has 2 non-trained weights and 2 trainable ones
Build a model
Rescaling layer to scale input values (initially in the [0, 255] range) to the [-1, 1] range
- Add a Dropout layer before the classification layer for regularization
- Make sure to pass training=False when calling the base model, so that it runs in inference mode
- The base model contains batchnorm layers, so make sure that the base_model is running inference mode here
Recursive setting of the trainable attribute
If you set trainable = False on a model or on any layer that has sublayers, all children layers become non-trainable as well
Standardizing the data
Raw images have a variety of sizes, and each pixel consists of 3 integer values between 0 and 255 (RGB level values).
- To feed a neural network, you need to: Standardize to a fixed image size, Normalize pixel values between -1 and 1, and Batch the data and use caching & prefetching to optimize loading speed.
Train the top layer model.
The epochs = 20 model.fit(train_ds, epochs=epochs, validation_data=validation_ds)
- Epoch 1/20 291/291 [==============================] – 133s 451ms/step – loss: 0.9716
- Binary_accuracy
- Val_loss
- 0.9267
- val_loss:0.9572
- binary_loss = 0.9502
- loss=0.9699
- Loss=keras.losses.BinaryCrossentropy(from_logits=True), and loss=kara.optimizers.Adam
- model = model.compile( optimizer=kerasia.metrics.optimizations.Adam(), loss = kara.lossesian(), and metrics = metrics.KinematicBinaryAccuracy()], )
Do a round of fine-tuning of the entire model
Although the base model becomes trainable, it is still running in inference mode since we passed training=False when calling it when we built the model.
- This prevents the batch normalization layers from undoing all the training the batchnorm layers have done so far
- Unfreeze the base_model and train it end-to-end with a low learning rate.
Fine-tuning
Once your model has converged on the new data, you can try to unfreeze all or part of the base model and retrain the whole model end-to-end with a very low learning rate.
- Note: Calling compile() on a model is meant to “freeze” the behavior of that model, so the trainable attribute values at the time the model is compiled should be preserved throughout the lifetime of the model, until compile is called again.
An end-to-end example: fine-tuning an image classification model on a cats vs. dogs dataset
Load the Xception model, pre-trained on ImageNet, and use it on the Kaggle “cats vs dogs” classification dataset.
Getting the data
Transfer learning is most useful when working with very small datasets.
- To keep our dataset small, we will use 40% of the original training data (25,000 images) for training, 10% for validation, and 10% as testing
- The following are the first 9 images in the training dataset:
- Number of training samples: 9305 Number of validation samples: 2326 Number of test samples: 192326