Assignment #2 - Getting Started with Keras

Deep Learning / Spring 1399, Iran University of Science and Technology


Please pay attention to these notes:

  • Assignment Due: 1399/01/15 23:59:00
  • If you need any additional information, please review the assignment page on the course website.
  • The items you need to answer are highlighted in red and the coding parts you need to implement are denoted by:
    ########################################
    #     Put your implementation here     #
    ########################################
  • We always recommend co-operation and discussion in groups for assignments. However, each student has to finish all the questions by him/herself. If our matching system identifies any sort of copying, you'll be responsible for consequences. So, please mention his/her name if you have a team-mate.
  • Students who audit this course should submit their assignments like other students to be qualified for attending the rest of the sessions.
  • Finding any sort of copying will zero down that assignment grade and also will be counted as two negative assignment for your final score.
  • When you are ready to submit, please follow the instructions at the end of this notebook.
  • If you have any questions about this assignment, feel free to drop us a line. You may also post your questions on the course Forum page.
  • You must run this notebook on Google Colab platform, it depends on Google Colab VM for some of the depencecies.
  • Before starting to work on the assignment Please fill your name in the next section AND Remember to RUN the cell.


Course Forum: https://groups.google.com/forum/#!forum/dl982/


Fill your information here & run the cell

In [0]:
#@title Enter your information & "RUN the cell!!" { run: "auto" }
student_id =   0#@param {type:"integer"}
student_name = "" #@param {type:"string"}

print("your student id:", student_id)
print("your name:", student_name)


from pathlib import Path

ASSIGNMENT_PATH = Path('asg02')
ASSIGNMENT_PATH.mkdir(parents=True, exist_ok=True)

1. Keras Basics

In this part of the assignment, you'll become familliar with basic principles of coding in Keras! There are 3 main APIs in keras that you can use depending on the task at hand. These 3 APIs are: Sequential API, Functional API and Model subclassing. You'll implement simple models using all these APIs and will learn when to use each.

In [0]:
import keras
import numpy as np

1.1 Sequential API

Sequential API is the simplest API to use but the downside is that you can only train simple models with it! You can only implement models that are basically a stack of layers which most of the time isn't the case.

In this part, you should implement simple MLPs to train on the California Housing dataset and the digits dataset.

Let's first load the California Housing dataset.

In [0]:
from sklearn.datasets import fetch_california_housing
x_CH, y_CH = fetch_california_housing(return_X_y=True)

The Task is to predict the price of houses based on the features presented in the dataset. We've also split the dataset into training, validation and test and done some pre-processing.

In [0]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler

def SplitAndScale(x, y):
  x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=0.33)
  x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2)
  sc = StandardScaler()
  x_train = sc.fit_transform(x_train)
  x_val = sc.transform(x_val)
  x_test = sc.transform(x_test)

  return (x_train, x_val, x_test), (y_train, y_val, y_test)
In [0]:
(x_train_CH, x_val_CH, x_test_CH), (y_train_CH, y_val_CH, y_test_CH) = SplitAndScale(x_CH, y_CH)

Note that we don't know at which epoch our model has the best results on the test set, because the test set is supposed to be unseen, but we can use a portion of the training set as validation set and save the model with the best result on validation set! This can be done using Callbacks. Callbacks give you control over the behavior of the model during training and evaluation. You can check out the Keras documentation for more information on Callbacks.

Now implement a simple MLP for predicting housing prices in California Housing dataset using the Sequential API.

In [0]:
from keras import Sequential
from keras.layers import Dense
from keras.models import load_model
In [0]:
checkpoint_path = 'reg_mlp.h5' 

checkpoint = keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, mode='min', monitor='val_mean_absolute_error', verbose=0, save_best_only=True)

########################################
#     Put your implementation here     #
########################################

reg_mlp.compile(optimizer='adam', loss='mse', metrics=['mae'])
reg_mlp.fit(x_train_CH, y_train_CH, batch_size=128, epochs=20, validation_data=(x_val_CH, y_val_CH), callbacks=[checkpoint])
In [0]:
reg_mlp = load_model('reg_mlp.h5')
results = reg_mlp.evaluate(x_test_CH, y_test_CH)
print('Mean absolute error on test set: ', results[1])

For the next part implement an MLP to classify the digits in the digits dataset! This dataset consists of 8$\times$8 images of digits and each image's label is a number from 0 to 9 which makes it a multiclass classification task.

In [0]:
from sklearn.datasets import load_digits
In [0]:
import matplotlib.pyplot as plt
x_dig, y_dig = load_digits(return_X_y=True)
for i in range(6):
  plt.subplot(231+i)
  plt.imshow(x_dig[i].reshape(8,8), cmap='gray')
In [0]:
(x_train_dig, x_val_dig, x_test_dig), (y_train_dig, y_val_dig, y_test_dig) = SplitAndScale(x_dig, y_dig)
In [0]:
checkpoint_path = 'clf_mlp.h5' 

checkpoint = keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, mode='max', monitor='val_acc', verbose=0, save_best_only=True)

########################################
#     Put your implementation here     #
########################################

clf_mlp.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
clf_mlp.fit(x_train_dig, y_train_dig, batch_size=64, epochs=20, validation_data=(x_val_dig, y_val_dig), callbacks=[checkpoint])
In [0]:
clf_mlp = load_model('clf_mlp.h5')
results = clf_mlp.evaluate(x_test_dig, y_test_dig)
print('Accuracy on test set: ', results[1])

1.2 Functional API

Sequential API is a fast way to implement very simple models but to implement more complex models, like models with multiple inputs and outputs or non-sequential models, we have to use the Functional API.
Imagine a model like this:

</center>

You can't create such a model using Sequential API! This model is called a "wide and deep" model. The concept is that we want our model to learn simple features as well as deep features. This kind of architecture is used in recommender systems but let's implement it using the Functional API just to see if it will improve the results on the California Housing dataset!

In [0]:
from keras.layers import Input, Concatenate
from keras import Model
In [0]:
checkpoint_path = 'WideAndDeep.h5' 
checkpoint = keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, mode='min', monitor='val_mean_absolute_error', verbose=0, save_best_only=True)

########################################
#     Put your implementation here     #
########################################

WideAndDeep.compile(optimizer='adam', loss='mse', metrics=['mae'])
WideAndDeep.fit(x_train_CH, y_train_CH, batch_size=128, epochs=20, validation_data=(x_val_CH, y_val_CH), callbacks=[checkpoint])
In [0]:
modWideAndDeepel = load_model('WideAndDeep.h5')
results = WideAndDeep.evaluate(x_test_CH, y_test_CH)
print('Mean absolute error on test set: ', results[1])

1.3 Custom Layers

You can find many of the layers you'll need in keras.layers but you can always make your own custom layer using keras. In this part you should implement a custom layer called DenseStack. This layer accepts a list of layer sizes and activations and creates a stack of Dense layers with those sizes and activations. For example, using DenseStack([128, 64], ['relu', 'relu']) is equivalent to using Dense(128, 'relu') and then Dense(64, 'relu') back to back.

Refer to this for information on how to create a custom layer. Also, you are free to use any layer in keras.layers this part of the assignment( or any other part that involves creating a custom layer or model).

In [0]:
class DenseStack(keras.layers.Layer):

    def __init__(self, layer_sizes, activations, **kwargs):

      super().__init__(**kwargs)
      self.layer_sizes = layer_sizes
      self.activations = activations


    def build(self, input_shape):

      ########################################
      #     Put your implementation here     #
      ########################################

      super(DenseStack, self).build(input_shape)

    def call(self, X):

      ########################################
      #     Put your implementation here     #
      ########################################

    def compute_output_shape(self, input_shape):

      ########################################
      #     Put your implementation here     #
      ########################################

    def get_config(self):
   
        config = {**super().get_config(),
                  'layer_sizes': self.layer_sizes,
                  'activations': self.activations}
        return config

Now create a new model using the new DenseStack layer you just created.

In [0]:
checkpoint_path = 'mlp_DenseStack.h5' 

checkpoint = keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, mode='max', monitor='val_acc', verbose=0, save_best_only=True)

########################################
#     Put your implementation here     #
########################################

mlp_DenseStack.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
mlp_DenseStack.fit(x_train_dig, y_train_dig, batch_size=64, epochs=20, validation_data=(x_val_dig, y_val_dig), callbacks=[checkpoint])
In [0]:
mlp_DenseStack = load_model('mlp_DenseStack.h5', custom_objects={'DenseStack': DenseStack})
results = mlp_DenseStack.evaluate(x_test_dig, y_test_dig)
print('Accuracy on test set: ', results[1])

1.4 Model Subclassing

You can make almost any model using the Functional API, but there are cases when you'll need even more flexibility. For example if you have a model that contains loops or conditional branching, you can't use the Functional API because you can only make models that are DAGs(Directed Acyclic Graph) of layers with the Functional API.

In this section you should create a custom model named FunnelMLP that creates an MLP with funnel-like hidden layers, meaning each hidden layer's size is half the size of it's previous layer, for example 512->256->128. Here are the inputs of the model:

  • first_size: Size of the first hidden layer.
  • num_hidden_layers: Number of hidden layers. So if first_size=1024 and num_hidden_layers=4, hidden layers of the model will be like this:
    **1024->512->256->128**
  • hidden_activation: A string denoting the activation function used in the hidden layers. Note that all hidden layers in this model have the same activation function.
  • num_classes: Number of classes in a classification task. This should be set to 0 if we are using this model for a regression task. You should choose the size and activation of the last layer based on this parameter.

Refer to this for information on how to make custom models using the Model Subclassing API.

In [0]:
class FunnelMLP(keras.Model):

  def __init__(self, first_size, num_hidden_layers, hidden_activation, num_classes, **kwargs):

    ########################################
    #     Put your implementation here     #
    ########################################

    
  def call(self, inputs):
    
    ########################################
    #     Put your implementation here     #
    ########################################

Train your model on the digits dataset

In [0]:
checkpoint_path = 'FunnelMLP.h5' 

checkpoint = keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, mode='max', monitor='val_acc', verbose=0, save_best_only=True, save_weights_only=True)

########################################
#     Put your implementation here     #
########################################

Funnelmlp.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
Funnelmlp.fit(x_train_dig, y_train_dig, batch_size=64, epochs=10, validation_data=(x_val_dig, y_val_dig), callbacks=[checkpoint])
In [0]:
Funnelmlp.load_weights('FunnelMLP.h5')
results = Funnelmlp.evaluate(x_test_dig, y_test_dig)
print('Accuracy on test set: ', results[1])

Now train your model on the California Housing dataset.

In [0]:
checkpoint_path = 'regFunnelMLP.h5' 

checkpoint = keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, mode='min', monitor='val_mean_absolute_error', verbose=0, save_best_only=True, save_weights_only=True)

########################################
#     Put your implementation here     #
########################################

regFunnelmlp.compile(optimizer='adam', loss='mse', metrics=['mae'])
regFunnelmlp.fit(x_train_CH, y_train_CH, batch_size=128, epochs=20, validation_data=(x_val_CH, y_val_CH), callbacks=[checkpoint])
In [0]:
regFunnelmlp.load_weights('regFunnelMLP.h5')
results = regFunnelmlp.evaluate(x_test_CH, y_test_CH)
print('Mean absolute error on test set: ', results[1])

2. Regularization in Keras

We want our model to not only do well on the training set but also on unseen data, in other words we want to have better generalization, even at the cost of increased error on training data. Regularization is the process of reducing a model's error on unseen data and therefore reducing model's variance. These are many regularizaion methods in deep learning and in this part of the assignment, we use three of them in practice.

We'll be using the fashion mnist dataset in this section. This dataset consist of images of 10 categories of clothing and the number of training and test instances is just like the regular mnist!

In [0]:
from keras.datasets import fashion_mnist
In [0]:
(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()
In [0]:
x_train = x_train/255.
x_test = x_test/255.
In [0]:
x_train, x_val, y_train, y_val = train_test_split(x_train, y_train, test_size=0.2)

Let's train a simple MLP model on fashion mnist dataset.

In [0]:
from keras.layers import  Flatten
In [0]:
checkpoint_path = 'overfitted_mlp.h5' 

checkpoint = keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, mode='max', monitor='val_acc', verbose=0, save_best_only=True)

overfitted_mlp = Sequential()
overfitted_mlp.add(Flatten(input_shape=(28, 28)))
overfitted_mlp.add(Dense(256, activation='relu'))
overfitted_mlp.add(Dense(128, activation='relu'))
overfitted_mlp.add(Dense(10, activation='softmax'))

overfitted_mlp.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
overfit_history = overfitted_mlp.fit(x_train, y_train, batch_size=512, epochs=50, validation_data=(x_val, y_val), callbacks=[checkpoint])
In [0]:
overfitted_mlp = load_model('overfitted_mlp.h5')
results = overfitted_mlp.evaluate(x_test, y_test)
print('Accuracy on test set: ', results[1])

Let's plot training and validation accuracy and loss.

In [0]:
import matplotlib.pyplot as plt

def visualize_loss_and_acc(history):
  history_dict = history.history
  loss_values = history_dict['loss']
  val_loss_values = history_dict['val_loss']
  acc = history_dict['acc']

  epochs = range(1, len(acc) + 1)

  f = plt.figure(figsize=(10,3))

  plt.subplot(1,2,1)
  plt.plot(epochs, loss_values, 'bo', label='Training loss')
  plt.plot(epochs, val_loss_values, 'b', label='Validation loss')
  plt.title('Training and validation loss')
  plt.xlabel('Epochs')
  plt.ylabel('Loss')
  plt.legend()


  acc_values = history_dict['acc']
  val_acc = history_dict['val_acc']

  plt.subplot(1,2,2)
  plt.plot(epochs, acc, 'bo', label='Training acc')
  plt.plot(epochs, val_acc, 'b', label='Validation acc')
  plt.title('Training and validation accuracy')
  plt.xlabel('Epochs')
  plt.ylabel('Loss')
  plt.legend()

  plt.show()
In [0]:
visualize_loss_and_acc(overfit_history)

It's clear that this model has overfitted the training set. We'll use three regularization methods in this part to mitigate this problem.

2.1 Dropout

Dropout is a very popular and computationaly inexpensive regularization method. Dropout basically creates masks over layers and randomly selects some neurons in a layer and turns all their connections off, which is said to help prevent co-adaptaion of neuron.

Now use dropout to improve the generalization of the last model.

In [0]:
from keras.layers import Dropout
In [0]:
checkpoint_path = 'drop_mlp.h5' 

checkpoint = keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, mode='max', monitor='val_acc', verbose=0, save_best_only=True)

########################################
#     Put your implementation here     #
########################################

drop_mlp.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
drop_history = drop_mlp.fit(x_train, y_train, batch_size=512, epochs=50, validation_data=(x_val, y_val), callbacks=[checkpoint])
In [0]:
drop_mlp = load_model('drop_mlp.h5')
results = drop_mlp.evaluate(x_test, y_test)
print('Accuracy on test set: ', results[1])
In [0]:
visualize_loss_and_acc(drop_history)

Question: Dropout layer is only active during training and is deactivated during testing. So since some connections are off during training, the activation of layers during training is smaller then the activations during test time. How do we make up for this difference?

Write your answers here

2.2 Monte Carlo Dropout (Bonus)

The idea behind Monte Carlo dropout is to let dropout be active also during evaluation, but instead of predicting the test set once, we predict multiple times and average the prediction probabilities to form the final prediction.
Now to activate dropout during training, reimplement your model(drop_mlp in the last part) using the Functional API and use dropout like this:

Dropout(rate)(x, training=True)
</center>

Also use the exact same hyper-parameters as the last model you created (drop_mlp) so we can copy the weights of that model into this new model.

In [0]:
inp = Input(shape=(28, 28))
########################################
#     Put your implementation here     #
########################################

MC_mlp = Model(inputs=[inp], outputs=[out])
MC_mlp.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
In [0]:
drop_mlp = load_model('drop_mlp.h5')
MC_mlp.set_weights(drop_mlp.get_weights())

Question: Evaluate the test set multiple times (run the cell below multiple times). Why is the accuracy different each time?

Write your answers here

In [0]:
### run multiple times
results = MC_mlp.evaluate(x_test, y_test)
print('Accuracy on test set: ', results[1])
In [0]:
from sklearn.metrics import accuracy_score
result = drop_mlp.evaluate(x_test, y_test)
print('Accuracy on test set using regular dropout: ', result[1])
y_preds = []
for i in range(1000):
  y_pred = MC_mlp.predict(x_test)
  y_preds.append(y_pred)
y_pred = np.mean(np.array(y_preds), axis=0)
print('Accuracy on test set using monte carlo dropout: ', accuracy_score(y_test, np.argmax(y_pred, axis=1)))

Question: Did Monte Carlo dropout perform better than the regular dropout? explain why.

Write your answers here

2.3 Noise Robustness

Another method of regularization is adding noise to inputs or hidden layers or weights or in some cases even the outputs of a network.

Question: Explain how adding noise to different parts of a neural network can act as a regularizer?

Write your answers here

In this part, we want to add Gaussian noise to the output of hidden layers of the network. You should implement a custom layer named NoisyDense. This layer is basically a regular Keras Dense layer but adds gaussian noise with $\mu=0$ and a specified $\sigma$ to the output of the Dense.

In [0]:
import keras.backend as K
In [0]:
class NoisyDense(keras.layers.Layer):

    def __init__(self, layer_size, activation, std, **kwargs):

      super().__init__(**kwargs)
      self.layer_size = layer_size
      self.std = std
      self.activation = activation


    def build(self, input_shape):

      ########################################
      #     Put your implementation here     #
      ########################################


      super(NoisyDense, self).build(input_shape)

    def call(self, X):
      
      ########################################
      #     Put your implementation here     #
      ########################################

    def compute_output_shape(self, input_shape):

      ########################################
      #     Put your implementation here     #
      ########################################

    def get_config(self):
   
        config = {**super().get_config(),
                  'layer_size': self.layer_size,
                  'activation': self.activation,
                  'std': self.std}
        return config

Try to a good value for $\sigma$ to prevent overfitting.

In [0]:
### you should change this value 
sigma = 0.00001
In [0]:
checkpoint_path = 'noisy_mlp.h5' 

checkpoint = keras.callbacks.ModelCheckpoint(filepath=checkpoint_path, mode='max', monitor='val_acc', verbose=0, save_best_only=True)

noisy_mlp = Sequential()
noisy_mlp.add(Flatten(input_shape=(28, 28)))
noisy_mlp.add(NoisyDense(256, activation='relu', std=sigma))
noisy_mlp.add(NoisyDense(128, activation='relu', std=sigma))
noisy_mlp.add(Dense(10, activation='softmax'))

noisy_mlp.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['acc'])
noisy_history = noisy_mlp.fit(x_train, y_train, batch_size=512, epochs=50, validation_data=(x_val, y_val), callbacks=[checkpoint])
In [0]:
noisy_mlp = load_model('noisy_mlp.h5', custom_objects={'NoisyDense': NoisyDense})
results = noisy_mlp.evaluate(x_test, y_test)
print('Accuracy on test set: ', results[1])
In [0]:
visualize_loss_and_acc(noisy_history)

Submission

Congratulations! You finished the assignment & you're ready to submit your work. Please follow the instructions:

  1. Check and review your answers. Make sure all of the cell outputs are what you want.
  2. Select File > Save.
  3. Run Make Submission cell, It may take several minutes and it may ask you for your credential.
  4. Run Download Submission cell to obtain your submission as a zip file.
  5. Grab the downloaded file (dl_asg02__xx__xx.zip) and upload it via https://forms.gle/mTdqSUVx1emSsmo98

Make Submission (Run the cell)

In [0]:
#@title
! pip install -U --quiet PyDrive > /dev/null
# ! wget -q https://github.com/github/hub/releases/download/v2.10.0/hub-linux-amd64-2.10.0.tgz 
  
import os
import time
import yaml
import json

from google.colab import files
from IPython.display import Javascript
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

asg_name = 'assignment_2'
script_save = '''
require(["base/js/namespace"],function(Jupyter) {
    Jupyter.notebook.save_checkpoint();
});
'''
# repo_name = 'iust-deep-learning-assignments'
submission_file_name = 'dl_asg02__%s__%s.zip'%(student_id, student_name.lower().replace(' ',  '_'))

sub_info = {
    'student_id': student_id,
    'student_name': student_name, 
    'dateime': str(time.time()),
    'asg_name': asg_name
}
json.dump(sub_info, open('info.json', 'w'))

Javascript(script_save)

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
file_id = drive.ListFile({'q':"title='%s.ipynb'"%asg_name}).GetList()[0]['id']
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('%s.ipynb'%asg_name) 

! jupyter nbconvert --to script "$asg_name".ipynb > /dev/null
! jupyter nbconvert --to html "$asg_name".ipynb > /dev/null
! zip "$submission_file_name" "$asg_name".ipynb "$asg_name".html "$asg_name".txt info.json > /dev/null

print("##########################################")
print("Done! Submisson created, Please download using the bellow cell!")

Download Submission (Run the cell)

In [0]:
files.download(submission_file_name)
In [0]: