Creating AlexNet on Tensorflow from Scratch. Part 5: Transfer Learning

5 min readFeb 9, 2019

The previous parts allowed us to train the data and improve accuracy by adding dropout, and tuning hyperparameters. It’s nice to know how to train models, but in most cases, we wouldn’t train models from scratch. Instead, we would use existing model with given weights, and tune it to our dataset. This is called Transfer Learning.

Getting Existing Model

You can get an existing AlexNet model from many places. I’m going to use the one from here.

From the website, we need to download all the files and place it in a folder. You can do it manually, use curl/wget, or just run this script which creates a pretrained/ folder, and download all the necessary files. The rest of the tutorial assumes you used the script.

To make sure everything is working correctly, you can cd into pretrained/ and run python3 myalexnet_forward_newtf.py.

Output example if everything is done correctly

The existing .py file loads the existing weights, but also runs tests. We don’t care about how well the model predicts the probability of categories for the sample images, so we’ll copy the necessary code and put it into a pretrained.py file in our project root folder.

The only modification I’d make is on line 9, which references the vlc_alexnet.npy to be in the pretrained/ folder instead of being in the root directory. I also changed image size to (224, 224).

Resizing Images

Pretrained AlexNet was trained on ImageNet images of size (224, 224), but CIFAR-10 data is (32, 32). There are multiple ways to solve this: add padding, or resize image. Since the difference between ImageNet size and Cifar-10 size is so great, if we add padding, it’ll take up most of the image.

Original CIFAR-10 Image

Instead, we’ll resize the image up. To do this, we’ll add methods to the existing cifar.py and helper.py with the help of scikit-image. First, we pip install the package with

pip3 install --user scikit-image

Function reshape serves as the basis just for using skimage.transform.resize, which is used by transform_to_input_output_and_pad and reshape_batch. The former transforms set into reshaped input and one-hot output. The latter does the same with a batch.

Next, we update cifar.py to utilize these helper functions.

We add create_resized_test_set and create_resized_batches that creates resized batches and test set.

Adding onto Existing Model

The ConvNet portion of AlexNet has been pretrained so it is already good at feature extractions. However, the fully connected layer is catered to ImageNet dataset. We will replace the pretrained fully connected layer with our own for CIFAR-10 dataset.

We’ll create a new file called transfer_training.py which contains code that loads the pretrained model, as well as training CIFAR-10 data as well. First, we import pretrained.py and take maxpool5, which is the layer just before the fully connected layer. We set up a single FC layer that outputs 10 classes for CIFAR-10. We lower the learning rate to 0.00001, because if we keep it as the same as before, the model actually wouldn’t learn. We set up everything else the same as before.

Next, we run the newly created cifar.create_resized_test_set. You could run create_resized_batches, but it takes up a lot of memory which I don’t have.

Next, we add all the training processes that we have done previously. I’ve decided to pad the images batch by batch since it takes up less memory even though it causes training to take longer. Therefore on line 38 and 39 below, I get each batch and reshape it by resizing the images to (224, 224). Then I just run the optimizer as I normally would.

After each epoch, we should test our accuracy on the test dataset. The resizing of the test dataset was done before so the testing can be compute quicker. At the end of each epoch after training, we take each test dataset and run it through accuracy. Due to memory limitations, I’ve split the test dataset into 100 batches and test on each one, and then average the accuracies.

When running python3 transfer_training.py for ~20 epochs, the test accuracy plateaus at ~80%, which is significantly greater than the ~60% from our own training. This is because the trainer of AlexNet has done most of the work in training the ConvNet layers really well, and we are just adapting it slightly to our dataset.

Adding onto Existing Model (Using VGG19)

In practice, there are other models we can use transfer learning on that has better results than AlexNet. VGG19 is really popular so we are going to use that to detect CIFAR-10 images. The mechanism for it is transferable, so you can actually use other models if you choose.

First, we need to download tensornets which has many pretrained models for Tensorflow. To do this, we run pip3 install --user tensornets. Then we create a new file called vgg_transfer_learning.py where we use transfer learning on VGG19. Setting up VGG19 is fairly basic and everything else is the same as what we did before.

For VGG19, the results are better when using tf.losess.softmax_cross_entropy, which is used in line 13. The differences between the various loss functions are explained here. We also use tf.identity for reasons that can be found here. Once we know how to set it up, the rest is identical.

After running it for a dozen epochs, we find that accuracy on the test dataset actually goes above 0.9.

Conclusion

While creating and training by yourself is an important exercise, using transfer learning is more useful as it obtains better results. Using pretrained AlexNet gets accuracy of ~0.80, while using VGG19 gets accuracy of ~0.9, which are far better than the accuracy of ~0.6 from our own training.