Assignment #3 - Computer Vision

Deep Learning / Spring 1399, Iran University of Science and Technology


Please pay attention to these notes:

  • Assignment Due: 1399/3/7 23:59:00
  • If you need any additional information, please review the assignment page on the course website.
  • The items you need to answer are highlighted in red and the coding parts you need to implement are denoted by:
    ########################################
    #     Put your implementation here     #
    ########################################
  • We always recommend co-operation and discussion in groups for assignments. However, each student has to finish all the questions by him/herself. If our matching system identifies any sort of copying, you'll be responsible for consequences. So, please mention his/her name if you have a team-mate.
  • Students who audit this course should submit their assignments like other students to be qualified for attending the rest of the sessions.
  • Finding any sort of copying will zero down that assignment grade and also will be counted as two negative assignment for your final score.
  • When you are ready to submit, please follow the instructions at the end of this notebook.
  • If you have any questions about this assignment, feel free to drop us a line. You may also post your questions on the course Forum page.
  • You must run this notebook on Google Colab platform, it depends on Google Colab VM for some of the depencecies.
  • Before starting to work on the assignment Please fill your name in the next section AND Remember to RUN the cell.


Course Forum: https://groups.google.com/forum/#!forum/dl982/


Fill your information here & run the cell

In [0]:
#@title Enter your information & "RUN the cell!!"
student_id =  0#@param {type:"integer"}
student_name = "" #@param {type:"string"}
Your_Github_account_Email = "" #@param {type:"string"}

print("your student id:", student_id)
print("your name:", student_name)


from pathlib import Path

ASSIGNMENT_PATH = Path('asg03')
ASSIGNMENT_PATH.mkdir(parents=True, exist_ok=True)

In this assignment we will learn how to perform image classification using Machine learning algorithm and deep computer vision with special layer called a convolutional neural network and improve the performance of our model with power of transfer learning.

The goal of our methods will be to classify and detect images. We will be using image data as our features and a label for those images as our label or output.

We already know how neural networks work so we can skip through the basics and move right into explaining our different approaches.

The major differences we are about to see in these types of neural networks are the layers that make them up.

The using data has ten classes of images: [airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck]

So briefly, this assignment consists of five subsections:

  • Reading & Preparing Image Data
  • Features Extraction for classical methods
  • SVM classification
  • Using CNN
  • Advantages of Transfer Learning
  • Visualization

Loading and Preprocessing Images

Loading the data

In [0]:
from imageio import imsave
from IPython.display import clear_output
import seaborn as sns
import sys
import os, sys, tarfile, errno
import numpy as np
import matplotlib.pyplot as plt
import urllib.request as urllib 
import scipy.io as sio

Before running the next cell, you should take the data by following this instruction.

For adding the data into your drive, follow the steps:

Now you have to mount your drive on this notebook to access the data. After running the next cell, you can see your google drive directory in Files, the left side of this page.

In [0]:
from google.colab import drive
drive.mount('/content/drive')
In [0]:
# path to the binary train file with image data
DATA_TRAIN_PATH = './drive/My Drive/IMAGES/train_X.bin'

# path to the binary test file with image data
DATA_TEST_PATH = './drive/My Drive/IMAGES/test_X.bin'

# path to the binary train file with labels
LABEL_TRAIN_PATH = './drive/My Drive/IMAGES/train_y.bin'

# path to the binary test file with labels
LABEL_TEST_PATH = './drive/My Drive/IMAGES/test_y.bin'
In [0]:
# image shape 
HEIGHT = 96
WIDTH = 96
DEPTH = 3

# size of a single image in bytes
SIZE = HEIGHT * WIDTH * DEPTH

For checking the various functions, you must read a single image. You can use numpy fromfile function. Then make the image matrix : 3x96x96

some hints:

1- The data is in binary format and you should read it in uint8 chunks - do not forget to make an image matrix

2- Transpose the image into a standard format to be readable by, for example, matplotlib.pyplot

3- All images are in the same size, so do not worry about them

In [0]:
def read_single_image(image_file):
    """
    input: the open file containing the images
    return: a single image
    """
    ########################################
    #     Put your implementation here     #
    ########################################
    return image

Now write a function to display an image. This function should take the image to be plotted in a 3-D matrix format. You can use matplotlib.pyplot.

In [0]:
def plot_image(image):
    """
    input: the image to be plotted in a 3-D matrix format
    return: None
    """
    ########################################
    #     Put your implementation here     #
    ########################################

Time to check your funcs! if everything is alright, you will see a cute bird by executing the next cell.

In [0]:
with open(DATA_TRAIN_PATH) as f:
    image = read_single_image(f)
    plot_image(image)
assert image.shape == (96, 96, 3)
  • You probably noticed that we force the data into 3x96x96 chunks, since the images are stored in "column-major order". It means that the first 96x96 values are the red channel, the next 96x96 are green, and the last are blue.
  • Now write a function to read all images instead of a sinle image like read_single_image. You should use reshape and transpose methods in this function.
In [0]:
def read_all_images(path_to_data):
    """
    input: the file containing the binary images 
    return: an array containing all the images
    """
    ########################################
    #     Put your implementation here     #
    ########################################

    return images

Test to check if the whole dataset is read correctly:

In [0]:
images = read_all_images(DATA_TRAIN_PATH)
print(images.shape)
assert images.shape == (5000, HEIGHT, WIDTH, DEPTH)
In [0]:
images_test = read_all_images(DATA_TEST_PATH)
print(images_test.shape)
assert images_test.shape == (8000, HEIGHT, WIDTH, DEPTH)
  • It's time to get the identities. Write a function to read all images labels.
In [0]:
def read_labels(path_to_labels):
    """
    input: path to the binary file containing labels 
    return: an array containing the labels
    """
    ########################################
    #     Put your implementation here     #
    ########################################
    
    return labels

read all training labels

In [0]:
#read all train labels 
labels = read_labels(LABEL_TRAIN_PATH)
labels = labels.reshape(5000,1)

read all testing labels

In [0]:
labels_test = read_labels(LABEL_TEST_PATH)
labels_test = labels_test.reshape(8000,1)

Here we want to save the images separately in their class folders. Solving this part gives you an insight of preparing and using images data in professional classification tasks. So it might be helpful in the future.

  • Save all images to the 'image' directory + 'label' name.
  • Images should be in png format
  • Use the defined parameters, Images and labels
In [0]:
def save_images(images, labels):
    print("Saving images to disk")
    i = 0
    for image in images:
        label = labels[i]
        directory = './image/' + str(label) + '/'
        try:
            os.makedirs(directory, exist_ok=True)
        except OSError as exc:
            if exc.errno == errno.EEXIST:
                pass
        filename = directory + str(i)
        print(filename)
        imsave("%s.png" % filename, image, format="png")
        i = i+1
In [0]:
# save images
save_images(images, labels)
clear_output()

Preprocessing and Feature Extraction

In [0]:
from numpy import array
%matplotlib inline
import string

from PIL import Image
import glob
from pickle import load
from keras.preprocessing import image as IMG

Pre-processing images is most important part while making programs related to image or optical recognition. When we use machine learning or learning networks to recognize images, the image must be in same format for which we have trained the network or our ML algorithm. So pre-processing is very much important to make images more precise and accurate. Steps in pre-processing may be resizing, cropping, changing hue, making black and white etc. In the next cell we want to do some pre-processing on an image.

As you noticed, we imported image from keras.preprocessing as IMG in the previous cell. For converting images to numpy array, this module will be usable. Also we want to use some functions of the keras pretrained networks called Inception-v3 for using pre-process on the inputs.

So briefly, you should implement the following steps to complete the next cell:

  • Convert image instance to numpy array (check IMG)
  • Check the Inception-V3 input dimensions
  • Use defined pre-process function of Inception-v3

Hint: https://keras.io/applications/

In [0]:
def preprocess_image(img):
  
  ########################################
  #     Put your implementation here     #
  ########################################
  return img

For checking the performance of our preprocess_image function:

In [0]:
img = preprocess_image(image) # image = a bird
newimg = img.reshape(96,96,3)
plt.imshow(newimg)
plt.show()

Please describe the preprocess function you used in the previous cells.

$\color{red}{\text{Write Your Answer Here}}$

Let's see how to convert RGB image to grayscale with the use of numpy library:

In [0]:
def rgb2gray(rgb):
  """
      input : RGB image
      Returns: grayscale image
  """
  return np.dot(rgb[...,:3], [0.299, 0.587, 0.144])
  • Make a list of images that each one is pre-processed, grayscaled and flattened. Use defined images array here, obtained from read_all_images. It's essential to use our implemented functions: (preprocess_image, rgb2gray)
  • Then convert the list to an array which named new_image_list.
In [0]:
########################################
#     Put your implementation here     #
########################################
new_image_list.shape
assert new_image_list.shape == (5000, 9216)

Do all of the above implementations for the test data:

In [0]:
########################################
#     Put your implementation here     #
########################################
new_image_test_list.shape
assert new_image_test_list.shape == (8000, 9216)

If you completed all the past cells well, you can see the preprocessed images with their class labels correctly in the below.

In [0]:
plt.rcParams['figure.figsize'] = (7,9)
for i in range(9):
  plt.subplot(3,3,i+1)
  plt.imshow(new_image_list[4*i].reshape(96,96), interpolation='none', cmap = 'gray')
  plt.title("Class {}".format(labels[4*i][-1]))

PCA

Principal components analysis is essentially just a coordinate transformation. The original data are plotted on an X-axis and a Y-axis. For two-dimensional data, PCA seeks to rotate these two axes so that the new axis X’ lies along the direction of maximum variation in the data. PCA requires that the axes be perpendicular, so in two dimensions the choice of X’ will determine Y’. You obtain the transformed data by reading the x and y values off this new set of axes, X’ and Y’. For more than two dimensions, the first axis is in the direction of most variation; the second, in direction of the next-most variation; and so on.

Now we have an array of pre-processed and grayscaled images and we want to extract features with PCA method.

In [0]:
from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import sklearn.metrics as skm
from IPython.display import SVG
from sklearn.svm import SVC

Use PCA to extract features for both train and test data.

Check the Sklearn for performing PCA on your data and investigate what will happen if we use a value lower than 1 as an input of PCA? How do you get the most valuable features with the help of PCA? How do you obtain the sufficient number of most important features? Also explain the StandardScaler functionality!

$\color{red}{\text{Write Your Answer Here}}$

Note: Please be patient, the next part may take a few minutes to execute.

In [0]:
scaler = StandardScaler()
scaler.fit(new_image_list)
features_scaler = scaler.transform(new_image_list)
features_test_scaler = scaler.transform(new_image_test_list)
"""
  input: features_scaler and features_test_scaler
  output: features_pca and features_test_pca
"""
########################################
#     Put your implementation here     #
########################################
print('After using PCA on the dataset, shape: ', features_pca.shape)
print('After using PCA on the test set, shape: ', features_test_pca.shape)

Let's see the effect of PCA on an image:

In [0]:
sample = new_image_list[200]
sample.shape = (96,96)

a = plt.subplot(1,2,1)
a.set_title('Original Image')
plt.imshow(sample, cmap = plt.cm.gray_r)

sample = pca.inverse_transform(features_pca[200])
sample.shape = (96,96)

b = plt.subplot(1,2,2)
b.set_title("Reduced after PCA")
plt.imshow(sample, cmap = plt.cm.gray_r)

Image Classification with SVM

A Support Vector Machine (SVM) is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data (supervised learning), the algorithm outputs an optimal hyperplane which categorizes new examples. In two dimentional space this hyperplane is a line dividing a plane in two parts where in each class lay in either side.

Suppose you are given plot of two label classes on graph as shown in image (A). Can you decide a separating line for the classes?

(A)

You might have come up with something similar to following image (image B). It fairly separates the two classes. Any point that is left of line falls into black circle class and on right falls into blue square class. Separation of classes, that’s what SVM does. It finds out a line/ hyper-plane (in multidimensional space that separate outs classes).

(B)

Making it a Bit complex…

So far so good. Now consider what if we had data as shown in image below? Clearly, there is no line that can separate the two classes in this x-y plane. So what do we do?

We apply transformation and add one more dimension as we call it z-axis. Lets assume value of points on z plane, w = x² + y². In this case we can manipulate it as distance of point from z-origin. Now if we plot in z-axis, a clear separation is visible and a line can be drawn.

Can you draw a separating line in this plane?

plot of z-y axis. A separation can be made here.

When we transform back this line to original plane, it maps to circular boundary as shown in the following image. These transformations are called kernels.

Transforming back to x-y plane, a line transforms to circle.

Can you offer a way to help you find the best parameters of your classifier model?

$\color{red}{\text{Write Your Answer Here}}$

In [0]:
# Train a SVM on your data
########################################
#     Put your implementation here     #
########################################
In [0]:
algo_score = classifierWithPCA.score(features_test_pca, labels_test)
print(algo_score)

What does classifierWithPCA.score return?

$\color{red}{\text{Write Your Answer Here}}$

Get the accuracy of your model and show the classification report (precision, recall, f1-score, support)

In [0]:
########################################
#     Put your implementation here     #
########################################

As you noticed, the performance of the model is weak. Why is the accuracy so low? Explain it.

$\color{red}{\text{Write Your Answer Here}}$

Convolutional Neural Network

Now it is time to create our first convnet! This part is for the purpose of getting familiar with CNN architectures. After making the model, we will talk about how to improve its performance.

In [0]:
import keras
import random
import time
import pandas as pd
import tensorflow as tf
from keras import backend as K
from keras.layers import Layer, Dense, Activation, Dropout, Flatten
from keras.layers import Conv2D, MaxPooling2D
from keras.utils import to_categorical

from tensorflow.keras import layers, models
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, Flatten, Dropout, MaxPooling2D
from tensorflow.keras.preprocessing.image import ImageDataGenerator
from tensorflow.keras.callbacks import TensorBoard
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dropout, Dense, Conv2D, MaxPool2D, Flatten
from tensorflow.keras import layers, models, optimizers

Let's Normalize pixel values to be between 0 and 1. This will help our model converge to lower loss.

We also change labels values to be in the range of 0 to 9.

In [0]:
images, images_test = images / 255.0, images_test / 255.0
labels, labels_test = labels-1, labels_test-1

Checking dataset:

In [0]:
class_names = ['airplane', 'bird', 'car', 'cat', 'deer',
               'dog', 'horse', 'monkey', 'ship', 'truck']
plt.rcParams['figure.figsize'] = (7,9)
plt.figure(figsize=(10,10))
for i in range(25):
    plt.subplot(5,5,i+1)
    plt.xticks([])
    plt.yticks([])
    plt.grid(False)
    plt.imshow(images[i], cmap=plt.cm.binary)
    plt.xlabel(class_names[labels[i][0]])
plt.show()

CNN Architecture

A common architecture for a CNN is a stack of Conv2D and MaxPooling2D layers followed by a few denesly connected layers. The idea is that the stack of convolutional and maxPooling layers extract the features from the image. Then these features are flattened and fed to densly connected layers that determine the class of an image based on the presence of features.

We will start by building the Convolutional Base. Also using other layers like Dropout or BatchNormalization for improving the performance is optional.

After adding some Conv2D layers, you should notice that the depth of our image increases but the spacial dimensions reduce drastically. The output of Convolutional layers are our extracted features. Then we need to take these extracted features and add a way to classify them. This is why we add the Flatten and Dense layers to our model.

Note: Please try to make a Simple CNN in this part with low complexity. If the number of trainable parameters are large, you may face with a "session crashed" message.

In [0]:
def Simple_CNN_Model():
  model = models.Sequential()
  ########################################
  #     Put your implementation here     #
  ########################################
  model.compile(optimizer='adam',
              loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
              metrics=['accuracy'])
  
  return model

Visualize the training and evaluate the model:

In [0]:
def visualize_loss_and_acc(history):
  history_dict = history.history
  loss_values = history_dict['loss']
  val_loss_values = history_dict['val_loss']
  acc = history_dict['accuracy']

  epochs = range(1, len(acc) + 1)

  f = plt.figure(figsize=(10,3))

  plt.subplot(1,2,1)
  plt.plot(epochs, loss_values, 'bo', label='Training loss')
  plt.plot(epochs, val_loss_values, 'b', label='Validation loss')
  plt.title('Training and validation loss')
  plt.xlabel('Epochs')
  plt.ylabel('Loss')
  plt.legend()

  acc_values = history_dict['accuracy']
  val_acc = history_dict['val_accuracy']

  plt.subplot(1,2,2)
  plt.plot(epochs, acc, 'bo', label='Training acc')
  plt.plot(epochs, val_acc, 'b', label='Validation acc')
  plt.title('Training and validation accuracy')
  plt.xlabel('Epochs')
  plt.ylabel('Loss')
  plt.legend()

  plt.show()

Train the Simple_CNN_Model:

In [0]:
tensorboard = TensorBoard(log_dir=f"logs/{time.time()}", histogram_freq=1)
CnnModel = Simple_CNN_Model()
CnnModel.summary()

CNN_Model_history = CnnModel.fit(
    images, labels,
    batch_size=64,
    epochs=8,
    validation_data=(images_test, labels_test),
    shuffle=True,
    callbacks=[tensorboard])

It's time to call the visualization function and investigate the procedure of training:

In [0]:
visualize_loss_and_acc(CNN_Model_history)

Evaluate the model:

Let's check the trained model to predict labels. We can determine how well the model performed by looking at it's prediction on the test data set.

In [0]:
predictions = CnnModel.predict(images_test)

ROWS = 5
COLUMNS = 5
fig, ax = plt.subplots(ROWS,COLUMNS, figsize=(18,18))
for row in range(ROWS):
    for column in range(COLUMNS):
        imgIndex = random.randint(0, len(images_test))
        image = images_test[imgIndex]
        image = image.reshape(96,96,3)
        ax[row,column].imshow(image,cmap=plt.cm.binary)
        ax[row, column].set_title(class_names[np.argmax(predictions[imgIndex])], fontSize=10)
In [0]:
algo_score = CnnModel.evaluate(images_test, labels_test)[1]
print(algo_score)

You should be getting an accuracy of about 55%. This isn't bad for a simple model like this, but we'll dive into some better approaches for computer vision below.

In [0]:
# Remember to run this cell after each time you update the model, 
# this is a deliverable item of your assignemnt
CnnModel.save(str(ASSIGNMENT_PATH / 'Animals_Image_classification.h5'))

Using Pretrained Models

The pre-trained models are trained on very large scale image classification problems. The convolutional layers act as feature extractor and the fully connected layers act as Classifiers. Since these models are very large and have seen a huge number of images, they tend to learn very good, discriminative features. We can either use the convolutional layers merely as a feature extractor or we can tweak the already trained convolutional layers to suit our problem at hand. The former approach is known as Transfer Learning and the latter as Fine-tuning.

First of all for proving the power of these pretrained networks, let's check the ability of classification in one of them: InceptionV3

We just pass an image to the network and it will show us the probable classes for it. it's so fun!

In [0]:
from keras.applications.inception_v3 import InceptionV3
from keras.preprocessing import image as IMG

In this part you need to implement a function which reads an image as an input and performs some changes on it to be ready for prediction by inceptionv3. Because of necessities, it's no problem to import some modules and use them in your implementation.

In [0]:
model = InceptionV3(weights='imagenet')

def pred(img):
    ########################################
    #     Put your implementation here     #
    ########################################

def display_img(img):
    plt.figure(figsize = (3,3))
    plt.imshow(img)
    plt.axis('off')
    plt.show()

display_img(images[10])
pred(images[10])

But we want to use the network as a feature extractor. In this approach we create the base model from the pre-trained convnets. Keras provides a set of pre-trained networks, use this link and take a look at them. The model we are going to use as the convolutional base for our model is the VGG19.

We are going to use this model but only its convolutional base. So, when we load it, we'll specify that we don't want to load the top (classification) layer. We'll tell the model what input shape to expect and to use the predetermined weights from imagenet (Googles dataset).

Let's work with the pretrained models and learn how to freeze layers or let them train again to be finetuned. In the following cell, implement a code that loads until the layer 280 of the InceptionV3 network and freeze all the previous layers. By comparing the summary of models you can see the differences between the number of trainable parameters in each models.

In [0]:
from keras.models import Model

base_model = InceptionV3(include_top=True, weights='imagenet')
base_model.summary()

# Name of a new model with freezed layers = EncoderUpToALayer
########################################
#     Put your implementation here     #
########################################

EncoderUpToALayer.summary()

Now we are familiar with pretrained networks and how to customize them according to our needs.

In the following cells you should use VGG19 as your base model and then add some dense layers to make an improved model instead of simple CNN you designed.

Again in this part you may face with a "session crashed" message. So solve the each phase of this assignment in separate session to prevent from this issue.

In [0]:
from keras.applications.vgg19 import VGG19

def Improved_Model():

  ########################################
  #     Put your implementation here     #
  ########################################
  model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])

  return model

Train the Improved_Model:

In [0]:
tensorboard2 = TensorBoard(log_dir=f"logs2/{time.time()}", histogram_freq=1)
IMPV_Model = Improved_Model()
IMPV_Model.summary()

IMPV_Model_history = IMPV_Model.fit(images,
    labels, batch_size=64, epochs=9, validation_data=(images_test, labels_test), shuffle=True, callbacks=[tensorboard2])

It's time to call the visualization function and investigate the procedure of training:

In [0]:
visualize_loss_and_acc(IMPV_Model_history)

Evaluate the model. After running the next cell, you will find out how the model classifies images better than before.

In [0]:
predictions = IMPV_Model.predict(images_test)
# Maybe you need to run some of previous cells for defining and using variables
ROWS = 5
COLUMNS = 5
fig, ax = plt.subplots(ROWS,COLUMNS, figsize=(18,18))
for row in range(ROWS):
    for column in range(COLUMNS):
        imgIndex = random.randint(0, len(images_test))
        image = images_test[imgIndex]
        image = image.reshape(96,96,3)
        ax[row,column].imshow(image,cmap=plt.cm.binary)
        ax[row, column].set_title(class_names[np.argmax(predictions[imgIndex])], fontSize=10)
      
In [0]:
algo_score = IMPV_Model.evaluate(images_test, labels_test)[1]
print(algo_score)

You should be getting an accuracy of about 70%. You can see the power of transfer learning.

In [0]:
# Remember to run this cell after each time you update the model, 
# this is a deliverable item of your assignemnt
IMPV_Model.save(str(ASSIGNMENT_PATH / 'Animals_Image_classification_Transfer.h5'))

Visualizations

Now we want to figure out how our model classifies images, what the layers learn, and with what features they can recognize the content of an image.

So we need to examine the Convnets or Filters in more detail and understand what distinctive features are in their sight.

In fact, if we want to delve deeper in CNN models, we should visualize the effect of different filters on a single image. Another important subject is that these layers make a decision by looking at some parts of an image.

Let's break through these challenges.

In next cell we want to look at the output of CNN model's intermediate layers. By doing so, we are able to learn more about the working of these layers. In short, looking at the outputs of our model's convolution and their corresponding maxpooling layers allows us to understand our trained model and the effect of the layers, when provided with one image from test set.

So, for completing this part you need to make a list of IMPV_Model's layers and then construct a new model with input as same as the input of IMPV_Model and the output should be the different layers to show us the result of passing a single image through the filters.

These links may help you about the structure:

Finally, we will have a list of different outputs for each of the desired filters and we can plot them. Check your code with predicting one test image. You need to customize the shape of your test image for prediction.

In [0]:
########################################
#     Put your implementation here     #
########################################

DiffOutputs = ActivationModel.predict(test_image)

Let's check the implementation:

In [0]:
assert DiffOutputs[1].shape == (1, 96, 96, 64)

If we take a look at the different images from Convolution filters, it is pretty clear to see how different filters in layers are trying to activate various parts of an image. By visualizing the output from different convolutional layers, the most crucial thing that we will notice is that the layers which are deeper in the network show us more specific features, while the earlier layers tend to visualize general patterns like edges, texture, background etc. In fact, when we use Transfer Learning and train some part of a pre-trained network (pre-trained on a different and big dataset, like ImageNet) on a completely different dataset, we will use the knowledge of earlier layers and force the model to update the weight of final layers.

The general idea is to freeze the weights of earlier layers, because they will learn the general features, and we should only train the weights of deeper layers because these are the layers which are actually recognizing our objects in images.

In the next part you will see the effect of Conv2d and MaxPooling layers in different depth of the model and understand the concept of mentioned points above.

For completing the following cell, you should just write a simple code to get the size and the number of rows as a n_rows. You can do this with the use of each one of the filters you can get from the DiffOutputs.

In [0]:
LayerNames = ['block1_conv2d_1', 'block1_pool_1', 'block3_conv2d_3', 'block3_pool_1']
ActivList = [DiffOutputs[1], DiffOutputs[3], DiffOutputs[9], DiffOutputs[11]]

NumOfImagesPerRow = 16

for layer_name, layer_activation in zip(LayerNames, ActivList):
  ########################################
  #     Put your implementation here     #
  ########################################
  display_grid = np.zeros((size * n_rows, NumOfImagesPerRow * size)) 
  if layer_name == 'block1_conv2d_1' or layer_name == 'block1_pool_1':
    assert n_rows == 4

  if layer_name == 'block3_conv2d_3' or layer_name == 'block3_pool_1':
    assert n_rows == 16

  for col in range(n_rows):
    for row in range(NumOfImagesPerRow):
      channel_image = layer_activation[0, :, :, col * NumOfImagesPerRow + row]
      channel_image -= channel_image.mean()
      channel_image /= channel_image.std()
      channel_image *= 64
      channel_image += 128
      channel_image = np.clip(channel_image, 0, 255).astype('uint8')
      display_grid[col * size : (col + 1) * size, row * size : (row + 1) * size] = channel_image

  scale = 1 / size
  plt.figure(figsize=(scale * display_grid.shape[1], scale * display_grid.shape[0]))
  plt.title(layer_name)
  plt.grid(False)
  plt.imshow(display_grid, aspect='auto')
  plt.savefig(layer_name+"_grid.jpg", bbox_inches='tight')

Class Activation Mapping

In this section, you'll learn about Class Activation Mapping (CAM) and implement it using keras.

In [0]:
import tensorflow as tf
from tensorflow.keras.layers import Dense, Input
from tensorflow.keras.models import Model
import numpy as np

CAM was introduced in this paper and is a technique for demonstrating where in an image a network pays the most attention to when making a prediction!
To implement CAM, you have to use a Global Average Pooling (GAP) layer instead of the fully connected layers conventionaly used after convolution and pooling layers. So in Keras, instead of using a Flatten layer followed by some Dense layers, you should use a GlobalAveragePooling layer.
CAM
Let's go over the steps to implement CAM:

  • As you can see in the picture, you should first get the weights w1, w2, ... . These weights are basically the weights connecting the output of the GAP layer to the output neuron of the class whose activation map you want to create!
  • Next, you should use these weights to calculate a weighted average of the output feature maps of the convolution layers. The number of feature maps depends on the number of filters of the last convolution layer of the network. In this part of the assignment you should use VGG16, so we have 512 feature maps in the output of the last convolution layer. The final class activation map is the weighted average of these 512 feature maps!

Check out the paper for an in-depth explanation of this method.

Now let's load VGG16. For this part, we only want the last convolution layer of the VGG16 ('block5_conv3') to be trainable so you should write a code in the cell below to make this layer the only trainable layer of VGG16.

In [0]:
vgg16 = tf.keras.applications.VGG16(
    include_top=False,
    weights="imagenet",
)


########################################
#     Put your implementation here     #
########################################
    

We'll be using the Cats vs Dogs dataset for this part which is made up of more than 23000 images of dogs and cats. We use TensorFlow Datasets to load this dataset. There a more datasets in the TensorFlow website, you can check them out if you want to make activation maps for a different dataset!

In [0]:
import tensorflow_datasets as tfds

### load the dataset using tensorflow_datasets
(train_raw, test_raw), info = tfds.load('cats_vs_dogs',split=['train[:80%]', 'train[80%:]'],  with_info=True, as_supervised=True)
In [0]:
import matplotlib.pylab as plt

### show the first three images
for image, label in train_raw.take(3):
  plt.imshow(image)
  plt.show()

### resizing and preprocessing the images
IMAGE_SIZE = 250
def pre_process_image(image, label):
  image = tf.cast(image, tf.float32)
  image = image / 255.0
  image = tf.image.resize(image, (IMAGE_SIZE, IMAGE_SIZE))
  return image, label
In [0]:
### shuffling, setting the batch size and applying the preprocessing and resizing to the images
BATCH_SIZE = 50
train = train_raw.map(pre_process_image).shuffle(1000).repeat().batch(BATCH_SIZE)
test = test_raw.map(pre_process_image).repeat().batch(BATCH_SIZE)

Now use VGG16 and a GAP layer to create an image classifier for the dogs and cats dataset!
Instead of using a Dense with 1 unit and a sigmoid activation, use a Dense with 2 uints and a softmax activation so each class has its own set of weights and heatmaps are properly portrayed for each class

In [0]:
########################################
#     Put your implementation here     #
########################################


model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
model.summary()
In [0]:
model.fit(train, epochs=4, steps_per_epoch = (23262)*0.8//BATCH_SIZE)
In [0]:
model.evaluate(test, steps=(23262)*0.2//BATCH_SIZE)

Now implement the _getheatmap function in the cell below. This function recieves an image as input and outputs a heatmap (activation map) for the class with the highest probability. Note that you'll need the output feature maps of the VGG16 in your model to implement this function. You can make new models or functions above the _getheatmap function if you need!

In [0]:
########################################
#     Put your implementation here     #
########################################


def get_heatmap(x):
  
  """ 
  Args:
    x: A numpy array of shape (IMAGE_SIZE, IMAGE_SIZE, 3) which is an image
    
  Returns:
    A numpy of shape (FM, FM) where FM is the size of the output feature map of the last conv layer of the network

  """

  ########################################
  #     Put your implementation here     #
  ########################################
  return hmap

Now test your implementation using the cell below!

In [0]:
import cv2

test1by1 = test_raw.map(pre_process_image).shuffle(1000).repeat().batch(1)
ncols, nrows = 3, 3
coords = [(i, j) for i in range(ncols) for j in range(nrows)]
fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=(nrows*5,ncols*5))
for x, y in test1by1.take(nrows*ncols):
  img_idx = 0
  img = x.numpy()[img_idx]
  
  heatmap = get_heatmap(img)
  heatmap = 1-((heatmap - np.min(heatmap))/(np.max(heatmap)-np.min(heatmap)))
  heatmap = cv2.resize(heatmap, (img.shape[1], img.shape[0]))
 
  heatmap = np.uint8(255*heatmap)
  heatmap = cv2.applyColorMap(heatmap, cv2.COLORMAP_JET)
  
  heat_img = (heatmap/255)*0.6 + img
  coord = coords.pop()
  heat_img = np.clip(heat_img, 0, 1)
  ax[coord[0], coord[1]].imshow(heat_img)
  
plt.show()

Submission

Congratulations! You finished the assignment & you're ready to submit your work. Please follow the instructions:

  1. Check and review your answers. Make sure all of the cell outputs are what you want.
  2. Select File > Save.
  3. Run Make Submission cell, It may take several minutes and it may ask you for your credential.
  4. Run Download Submission cell to obtain your submission as a zip file.
  5. Grab the downloaded file (dl_asg03__xx__xx.zip) and upload it via https://forms.gle/xeBPAep5Sz7WvpN29

Make Submission (Run the cell)

In [0]:
#@title
! pip install -U --quiet PyDrive > /dev/null
! wget -q https://github.com/github/hub/releases/download/v2.10.0/hub-linux-amd64-2.10.0.tgz 
  
import os
import time
import yaml
import json

from google.colab import files
from IPython.display import Javascript
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

asg_name = 'Assignment3'
script_save = '''
require(["base/js/namespace"],function(Jupyter) {
    Jupyter.notebook.save_checkpoint();
});
'''
repo_name = 'iust-deep-learning-assignments'
submission_file_name = 'dl_asg03__%s__%s.zip'%(student_id, student_name.lower().replace(' ',  '_'))
course_url = 'https://iust-deep-learning.github.io/982/'

! tar xf hub-linux-amd64-2.10.0.tgz
! cd hub-linux-amd64-2.10.0/ && chmod a+x install && ./install
! hub config --global hub.protocol https
! hub config --global user.email "$Your_Github_account_Email"
! hub config --global user.name "$student_name"
! hub api -X GET /user
! hub api -X GET /user > user_info.json
! hub api -F affiliation=owner -X GET /user/repos > repos.json

user_info = json.load(open('user_info.json'))
repos = json.load(open('repos.json'))
repo_names = [r['name'] for r in repos]
has_repository = repo_name in repo_names
if not has_repository:
  get_ipython().system_raw('! hub api -X POST -F name=%s /user/repos homepage="%s" > repo_info.json' % (repo_name, course_url))
  repo_info = json.load(open('repo_info.json')) 
  repo_url = repo_info['clone_url']
else:
  username = user_info['login']
  ! hub api -F homepage="$course_url" -X PATCH /repos/$username/$repo_name
  for r in repos:
    if r['name'] == repo_name:
      repo_url = r['clone_url']
  
stream = open("/root/.config/hub", "r")
token = list(yaml.load_all(stream))[0]['github.com'][0]['oauth_token']
repo_url_with_token = 'https://'+token+"@" +repo_url.split('https://')[1]

! git clone "$repo_url_with_token"
! cp -r "$ASSIGNMENT_PATH" "$repo_name"/
! cd "$repo_name" && git add -A
! cd "$repo_name" && git commit -m "Add assignment 3 results"
! cd "$repo_name" && git push -u origin master

sub_info = {
    'student_id': student_id,
    'student_name': student_name, 
    'repo_url': repo_url,
    'asg_dir_contents': os.listdir(str(ASSIGNMENT_PATH)),
    'datetime': str(time.time()),
    'asg_name': asg_name
}
json.dump(sub_info, open('info.json', 'w'))

Javascript(script_save)

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
file_id = drive.ListFile({'q':"title='%s.ipynb'"%asg_name}).GetList()[0]['id']
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('%s.ipynb'%asg_name) 

! jupyter nbconvert --to script "$asg_name".ipynb > /dev/null
! jupyter nbconvert --to html "$asg_name".ipynb > /dev/null
! zip "$submission_file_name" "$asg_name".ipynb "$asg_name".html "$asg_name".txt info.json > /dev/null

print("##########################################")
print("Done! Submisson created, Please download using the bellow cell!")

Download Submission (Run the cell)

In [0]:
#@title
files.download(submission_file_name)
In [0]: