Assignment #1 - Multilayer Perceptron

Deep Learning / Spring 1399, Iran University of Science and Technology


Please pay attention to these notes:

  • Assignment Due: 1398/12/19 23:59:00
  • If you need any additional information, please review the assignment page on the course website.
  • The items you need to answer are highlighted in red and the coding parts you need to implement are denoted by:
    ########################################
    #     Put your implementation here     #
    ########################################
  • We always recommend co-operation and discussion in groups for assignments. However, each student has to finish all the questions by him/herself. If our matching system identifies any sort of copying, you'll be responsible for consequences. So, please mention his/her name if you have a team-mate.
  • Students who audit this course should submit their assignments like other students to be qualified for attending the rest of the sessions.
  • Finding any sort of copying will zero down that assignment grade and also will be counted as two negative assignment for your final score.
  • When you are ready to submit, please follow the instructions at the end of this notebook.
  • If you have any questions about this assignment, feel free to drop us a line. You may also post your questions on the course Forum page.
  • You must run this notebook on Google Colab platform, it depends on Google Colab VM for some of the depencecies.
  • Before starting to work on the assignment Please fill your name in the next section AND Remember to RUN the cell.


Assignment Page: https://iust-deep-learning.github.io/982/assignments/01_Multilayer_Perceptron

Course Forum: https://groups.google.com/forum/#!forum/dl982/


Fill your information here & run the cell

In [0]:
#@title Enter your information & "RUN the cell!!" { run: "auto" }
student_id =   0#@param {type:"integer"}
student_name = "" #@param {type:"string"}
Your_Github_account_Email = "" #@param {type:"string"}

print("your student id:", student_id)
print("your name:", student_name)


from pathlib import Path

ASSIGNMENT_PATH = Path('asg01')
ASSIGNMENT_PATH.mkdir(parents=True, exist_ok=True)

1. MLP from Scratch

In this assignment, you will explore and implement the properties of a primary deep learning model called multilayer perceptron(MLP). Basically, the goal of an MLP is to learn a non-linear mapping from inputs to outputs. We can show this mapping as $y = f(x; \theta)$ , where $x$ is the input and $\theta$ is a vector of all the parameters in the network, which we're trying to learn.

As you see in the figure, every MLP network consists of an input layer, an output layer, and one or more hidden layers in between. Each layer consists of one or more cells called Neurons. In every Neuron, a dot product between the inputs of the cell and a weight vector is calculated. The result of the dot product then goes through a non-linear function (activation function e.g. $tanh$ or $sigmoid$) and gives us the output of the neuron.


Thoughout this assignment, inputs will be matrices with the shape of $b \times M$ where $b$ is the batch size and $M$ is the number of features of inputs.
As for the equations, let's compute the output of the $i$th layer: $$A^i = f(A^{i-1}w^i + b^i)$$

Imagine that $(i-1)$th and $i$th layer have sizes of $n$ and $p$ respectively. The dimensions of weight and bias will be as follows:

$$w^{n\times p} , b^{1\times p}$$

Numpy is the only package you're allowed to use for implementing your MLP in this assignment, so let's import it in the cell below!

In [0]:
import numpy as np

1.1 Activation Functions

Now let's implement some activation functions! Linear, Relu and Sigmoid are the functions that we'll need in this assignment. Note that you should also implement their derivatives since you'll need them later for back-propagation.

In [0]:
## We've implemented the Linear activation function for you

def linear(x, deriv=False):

  return x if not deriv else np.ones_like(x)

def relu(x, deriv=False):
  """
  Args:
    x: A numpy array of any shape 
    deriv: True or False. determines if we want the derivative of the function or not.
    
  Returns:
    relu_out: A numpy array of the same shape as x. 
      Basically relu function or its derivative applied to every element of x
               
  """

  ########################################
  #     Put your implementation here     #
  ######################################## 
  
  
  
  return relu_out
  
def  sigmoid(x, deriv=False):
  """
  Args:
    x: A numpy array of any shape 
    deriv: True or False. determines if we want the derivative of the function or not.
    
  Returns:
    sig_out: A numpy array of the same shape as x. 
      Basically sigmoid function or its derivative applied to every element of x
               
  """

  ########################################
  #     Put your implementation here     #
  ########################################
  
  
  
  
  
  return sig_out
In [0]:
# Test your implementation
!wget -q https://github.com/iust-deep-learning/982/raw/master/static_files/assignments/asg01_assets/act_test.npy

x_act, relu_out, sig_out = np.load('act_test.npy', allow_pickle=True)
assert np.allclose( relu_out[0], relu(x_act, deriv=True), atol=1e-6, rtol=1e-5) and np.allclose(relu_out[1], relu(x_act, deriv=False), atol=1e-6, rtol=1e-5)
assert np.allclose(sig_out[0], sigmoid(x_act, deriv=True), atol=1e-6, rtol=1e-5) and np.allclose(sig_out[1], sigmoid(x_act, deriv=False), atol=1e-6, rtol=1e-5)

Question: Why do activation functions have to be non-linear? Could any non-linear function be used as an activation function?

Write your answers here

1.2 Forward Propagation

Now let's implement our MLP class. This class handles adding layers and doing the forward propagation. Here are the attributes of this class:
- parameters: A list of dictionaries in the form of {'w': weight, 'b': bias} where weight and bias are weight matrix and bias vector of a layer.
- act_funcs: A list of activation functions used in the corresponding layer.
- activations: A list of matrices each corresponding to the output of each layer.
- weighted_ins: A list of matrices each corresponding to the weighted input of each layer. Weighted input, as the name suggests, is layer's input multiplied by layer's weights and added to layer's bias. Which then goes into the layer's activation function to compute the layer's activations(outputs)!
Note that we store weighted inputs and outputs of the layers because we'll need them later for implementing the back-propagation algorithm.

You only need to complete the _feedforward function in the MLP class. This function performs forward propagation on the input.

In [0]:
class MLP:

  def __init__(self, input_dim):
    """
  Args:
    input_dim: An integer determining the inpu dimension of the MLP
               
  """

    self.input_dim = input_dim
    self.parameters = []
    self.act_funcs = []
    self.activations = []
    self.weighted_ins = []

  def add_layer(self, layer_size, act_func=linear):
    """
    Add layers to the MLP using this function
  Args:
    layer_size: An integer determinig the number of neurons in the layer
    act_func: A function applied to the units in the layer 
    
               
  """
    ### Size of the previous layer of mlp
    prev_size = self.input_dim if not self.parameters else self.parameters[-1]['w'].shape[-1]

    ### Weight scale used in He initialization
    weight_scale = np.sqrt(2/prev_size)
    ### initializing the weights and bias of the layer
    weight = np.random.normal(size=(prev_size, layer_size))*weight_scale
    bias = np.ones(layer_size) *0.1
    ### Add weights and bias of the layer to the parameters of the MLP
    self.parameters.append({'w': weight, 'b': bias})
    ### Add the layer's activation function 
    self.act_funcs.append(act_func)



  def feed_forward(self, X):
    """
    Propagate the inputs forward using this function
  Args:
    X: A numpy array of shape (b, input_dim) where b is the batch size and input_dim is the dimension of the input
    
  Returns:
    mlp_out: A numpy array of shape (b, out_dim) where b is the batch size and out_dim is the dimension of the output

    Hint: Don't forget to store weighted inputs and outputs of each layer in self.weighted_ins and self.activations respectively
               
  """
    self.activations = []
    self.weighted_ins = []
    mlp_out = X
    ########################################
    #     Put your implementation here     #
    ######################################## 
    
    

    return mlp_out
In [0]:
# Test your implementation
import pickle
!wget -q https://github.com/iust-deep-learning/982/raw/master/static_files/assignments/asg01_assets/mlptest.pkl

x = np.random.normal(size=(512, 100))
mlp = MLP(100)
mlp.add_layer(64, relu)
mlp.add_layer(32, relu)
out = mlp.feed_forward(x)
assert len(mlp.parameters) == 2
assert mlp.activations[0].shape == tuple([512, 64]) and mlp.weighted_ins[0].shape == tuple([512, 64])
assert mlp.activations[1].shape == tuple([512, 32]) and mlp.weighted_ins[1].shape == tuple([512, 32])
assert out.shape == tuple([512, 32])
assert np.array_equal(mlp.activations[-1], out)

x, out, parameters = pickle.load(open('mlptest.pkl', 'rb'))
mlp.parameters = parameters
assert np.allclose( out, mlp.feed_forward(x), atol=1e-6, rtol=1e-5)

Question: In the _addlayer function of the MLP class, we used a method called He initialization to initialize the weights. Explain how this method can help with the training of an MLP?

Write your answers here

1.3 Loss Function

In the previous sections, we implemented an MLP that accepts an input $x$ and propagates it forward and produces an output $\hat{y}$. The next step in implementing our MLP is to see how good our network's output $\hat{y}$ is compared to the target output $y$! This is where the loss function comes in. This function gets $y$ and $\hat{y}$ as its inputs and returns a scaler as its output. This scaler indicates how good current parameters of the network are.
the choice of this function depends on the task, e.g regression or binary classification. Since you'll be doing a multiclass classification later in this assignment, let's implement the cross-entropy function. Cross-entropy is the function mostly used for classification tasks but to use it in a multiclass setting, the network's outputs must be passed through a softmax activation function and the target output must be in one-hot encoded format.


$$Softmax(\hat{y})_i = \frac{e^{\hat{y}_i}}{\sum^{C}_j e^{\hat{y}_j}} $$
$$ Cross Entropy(y, \hat{y}) = -\sum_i^C {y_i log(Softmax(\hat{y})_i)}$$ Where $y$ and $\hat{y}$ are two one-hot encoded vectors. $y$ is a single target label and $\hat{y}$ is a single output.
Now let's first implement the softmax activation function! Note that the above formulas are for a single sample, however you should implement the batch version!

In [0]:
def softmax(y_hat):
  """
    Apply softmax to the inputs
  Args:
    y_hat: A numpy array of shape (b, out_dim) where b is the batch size and out_dim is the output dimension of the network(number of classes) 
    
  Returns:
    soft_out: A numpy array of shape (b, out_dim)
               
  """
  
  ########################################
  #     Put your implementation here     #
  ########################################

  return soft_out
In [0]:
# Test your implementation

y_hat = np.random.normal(size=(100, 5))
y_soft = softmax(y_hat)
assert y_hat.shape == y_soft.shape
assert all([(y - 1.)<1e-5 for y in np.sum(y_soft, axis=1)])
y_hat = np.array([[10,10,10,10], [0,0,0,0]])
assert np.allclose( softmax(y_hat), np.array([[0.25, 0.25, 0.25, 0.25], [0.25, 0.25, 0.25, 0.25]]), atol=1e-6, rtol=1e-5)

Now implement the categorical cross-entropy function ("categorical" refers to multiclass classification). Note that the inputs are in batches, so the loss of a batch of samples will be the average of losses of samples in the batch.

In [0]:
def categorical_cross_entropy(y, y_soft):
  """
    Compute the categorical cross entropy loss
  Args:
    y: A numpy array of shape (b, out_dim). Target labels of network.
    y_soft: A numpy array of shape (b, out_dim). Output of the softmax activation function
    
  Returns:
    loss: A scaler of type float. Average loss over a batch.

  Hint: Use np.mean to compute average loss of a batch
               
  """

  ########################################
  #     Put your implementation here     #
  ########################################

  return loss
In [0]:
# Test your implementation

y = np.array([[1,0,0], [0,0,1], [1,0,0], [0,1,0]])
y_hat = np.array([[10,1,1], [0,-1,9], [100,-9,9], [0.1,12,10]])
y_soft = softmax(y_hat)
assert round(categorical_cross_entropy(y, y_soft), 3) == 0.032

Great! You have implemented both softmax and categorical cross-entropy functions. Now instead of applying softmax activation function to the output layer of the MLP and then using categorical cross-entropy as loss function, we can merge these two steps and make a softmax categorical cross-entropy loss function and use linear activation function in the output layer! The reason behind this is that the gradient of the softmax categorical cross-entropy loss with respect to the MLP's output is efficiently calculated as:

$$ Softmax(\hat{y}) - y$$

for a single sample. Here $\hat{y}$ is the MLP's output and $y$ is the target output (labels).

Now let's implement the softmax categorical cross-entropy function!

In [0]:
def softmax_categorical_cross_entropy(y, y_hat, return_grad=False):
  """
    Compute the softmax categorical cross entropy loss
  Args:
    y: A numpy array of shape (b, out_dim). Target labels of network.
    y_hat: A numpy array of shape (b, out_dim). Output of the output layer of the network
    return_grad: If True return gradient of the loss with respect to y_hat. If False just return the loss
    
  Returns:
    loss: A scaler of type float. Average loss over a batch.
               
  """
  
  y_soft = softmax(y_hat)
  
  if not return_grad:
    loss = categorical_cross_entropy(y, y_soft)
    return loss
  else:
    loss_grad = (y_soft - y)/y.shape[0]
    return loss_grad

1.4 Back-Propagation

After calculating the loss of the MLP, we need to propagate this loss back to the hidden layers in order to calculate the gradient of the loss with respect to the weights and biases of the network. The algorithm used to calculate these gradients is called back-propagation or simply backprop. Backprop uses chain rule to compute the gradients of the network parameters. Now let's go over the steps of this algorithm (This is the fully matrix-based version):

  • calculate gradient of the loss with respect to $\hat{y}$
    $g \longleftarrow \nabla_\hat{y} Loss$
  • for each layer $L$ starting from the ouput layer:
       $g \longleftarrow g \odot f^\prime(weightedInput^{(L)})$   ($weightedInput^{(L)}$ is the weighted input of $L$th layer and $f$ is the activation function)
       $\nabla_{b^{(L)}}Loss \longleftarrow \sum_i^{batch} {g_i}$
       $\nabla_{w^{(L)}}Loss \longleftarrow output^{(L-1)T}g$   ($output^{(L-1)}$ is the output of $(L-1)$th layer )
       $g \longleftarrow gw^{(L)T}$

Check this for a detailed explanation of the back-propagation algorithm.

Now implement the back-propagation algorithm!

In [0]:
def mlp_gradients(mlp, loss_function, x, y):
  """
    Compute the gradient of loss with respect to mlp's weights and biases
  Args:
    mlp: An object of MLP class
    loss_function: A function used as loss function of the MLP
    x: A numpy array of shape (batch_size, input_dim). The MLP's input
    y: A numpy array of shape (batch_size, num_classes). Target labels
    
  Returns:
    gradients: A list of dictionaries {'w': dw, 'b': db} corresponding to the dictionaries in mlp.parameters
        dw is the gradient of loss with respect to the weights of the layer 
        db is the gradient of loss with respect to the bias of the layer 
               
  """  

  gradients = []

  ### get the output of the network
  y_hat = mlp.activations[-1]
  num_layers = len(mlp.parameters)

  ### compute gradient of the loss with respect to network output
  g = loss_function(y, y_hat, return_grad=True)

  ### You'll need the input in the last step of backprop so let's make a new list with x in the beginning
  activations = [x] + mlp.activations 
  
  for i in reversed(range(num_layers)):
    ########################################
    #     Put your implementation here     #
    ########################################
    
    
  return gradients
In [0]:
# Test your implementation

import pickle
!wget -q https://github.com/iust-deep-learning/982/raw/master/static_files/assignments/asg01_assets/grad_test.zip
!unzip grad_test.zip
x = np.load('grad_x.npy')
y = np.load('grad_y.npy')
mlp = pickle.load(open('grad_mlp_test.pkl', 'rb'))
expected_grads = pickle.load(open('grads', 'rb'))
mlp.feed_forward(x)
grads = mlp_gradients(mlp, softmax_categorical_cross_entropy, x, y)
assert all([np.allclose(eg['w'], g['w'], atol=1e-6, rtol=1e-5) and 
            np.allclose(eg['b'], g['b'], atol=1e-6, rtol=1e-5) 
            for eg, g in zip(expected_grads, grads)])

1.5 Optimizaion

Now that we've computed the gradients of the parameters of our MLP, we should optimize these parameters using the gradients in order for the network to produce better outputs.
Gradient descent is an optimizaion method that iteratively moves the paramters in the oposite direction of their gradients. Below is the update rule for gradient descent:

$$ w \leftarrow w - \alpha \nabla_wLoss$$
Where $\alpha$ is the learning rate hyperparameter.
There are three main variants of gradient descent: stochastic gradient descent, mini-batch gradient descent and batch gradient descent.
Mini-batch gradient descent is the most used variant in practice and that's what we'll use in this assignment

Let's perform a step of gradient descent on a simple MLP!

In [0]:
x = np.random.normal(size=(16, 10))
y = np.eye(16)
lr = 0.1
### Define the mlp 
mlp = MLP(x.shape[-1])
mlp.add_layer(16)
mlp.add_layer(8)
mlp.add_layer(y.shape[-1])
### compute mlp's output
y_hat = mlp.feed_forward(x)
### print current loss
print("loss before gradient descent: ", softmax_categorical_cross_entropy(y, y_hat))
### Compute gradients of the mlp's parameters 
grads = mlp_gradients(mlp, softmax_categorical_cross_entropy, x, y)
### perform gradient descent
mlp.parameters = [{'w':p['w']-lr*g['w'], 'b':p['b']-lr*g['b']} for g, p in zip(grads, mlp.parameters)]
### compute mlp's output again after gradeint descent
y_hat = mlp.feed_forward(x)
### print loss after gradient descent
print("loss after gradient descent: ", softmax_categorical_cross_entropy(y, y_hat))

Question: Do gradient descent steps always decrease the loss? why? (Hint: toy with the learning rate in the axample above!)

Write your answers here

Instead of using gradient descent, we'll be using an extention of it called gradient descent with momentum. So instead of updating the parameters based only on current gradients, we take into account the gradients from previous steps! This way, parameter updates will have lower variance and convergence will be faster and smoother. $$ v \leftarrow \gamma v - \alpha \nabla_wLoss$$ $$ w \leftarrow w + v$$ Where $w$ is denotes mlp's weights and $v$ is called velocity which is basically a weighted average of all previous gradients.
Here $\gamma$ determines how fast effects of the previous gradients fade and $\alpha$ is the learning rate.

Now let's implement the SGD class!

In [0]:
class SGD:

  def __init__(self, lr=0.01, momentum=0.9):
    """
  Args:
    lr: learning rate of the SGD optimizer
    momentum: momentum of the SGD optimizer

    Hint: velocity should be a list of dictionaries just like mlp.parameters
               
  """ 

    self.lr = lr
    self.momentum = momentum
    ### initialize velocity
    self.velocity = []
  
  def step(self, parameters, grads):

    """
    Perform a gradient descent step
  Args:
    parameters: A list of dictionaries {'w': weights , 'b': bias}. MLP's parameters. 
    grads: A list of dictionaries {'w': dw, 'b': db}. gradient of MLP's parameters. Basically the output of "mlp_gradients" function you implemented!
    
  Returns:
    Updated_parameters: A list of dictionaries {'w': weights , 'b': bias}. mlp's parameters after performing a step of gradient descent. 
               
  """

    ########################################
    #     Put your implementation here     #
    ######################################## 

    

    return Updated_parameters
    

2. Classifying Kannada Handwritten Digits

In this part of the assignment, you'll use the MLP you implemented in the first part to classify Kannada handwritten digits!
This dataset consists of 60000 images of handwritten digits in Kannada script.
You can check this github repository for more information about the dataset. let's download the dataset:

In [0]:
!wget -q https://github.com/iust-deep-learning/982/raw/master/static_files/assignments/asg01_assets/kannada.zip
!unzip kannada.zip
In [0]:
import pandas as pd
import matplotlib.pyplot as plt
train = pd.read_csv('train.csv')
In [0]:
train.head()

As you can see, the first column of the dataframe is the label, and the rest of the columns are the pixels. Let's put the dataset in numpy arrays. Also, we must normalize the pixel values to [0,1] range to help the convergence of our MLP model.

In [0]:
x = train.values[:, 1:]/255.
y = train.values[:, 0]
plt.imshow(x[10000].reshape(28, 28))

As we are doing a multiclass classification, the labels must be in one-hot encoded format.

In [0]:
def one_hot_encoder(y):

  y = y.reshape(-1)
  num_samples = y.shape[0]
  max_label = np.max(y)
  one_hot = np.zeros((num_samples, max_label+1))
  one_hot[np.arange(num_samples),y] = 1
  
  return one_hot

Now let's transform the labels into one-hot encoded format!

In [0]:
y = one_hot_encoder(y)

We've implemented the _get_minibatches function below. This function transforms the dataset into multiple batches. We need this function because we'll be doing mini-batch gradient descent.

In [0]:
import math

def get_mini_batches(x, y, batch_size, shuffle=True):

  idx = list(range(len(x)))
  np.random.shuffle(idx)
  steps = math.ceil(len(x)/batch_size)
  x, y = x[idx, :], y[idx, :]
  for i in range(steps):
    yield (x[i*batch_size: (i+1)*batch_size], y[i*batch_size: (i+1)*batch_size])



    

Evaluation metrics are used to measure the performance of a model after training. The choice of this metric depends on factors like the nature of the task (e.g classification or regression) or a dataset's characteristics (e.g class imbalance). For multiclass classification with balanced classes, accuracy is a reasonable choice.

We've implemented the accuracy function in the cell below:

In [0]:
def accuracy(y, y_hat):

  return np.mean(np.argmax(y, axis=-1)==np.argmax(y_hat, axis=-1))
  

Now let's split the dataset into train and validatoin sets:

In [0]:
from sklearn.model_selection import train_test_split

x_train, x_val, y_train, y_val = train_test_split(x, y, stratify=y)

Everything is now ready for training our MLP! Create your MLP model in the cell bellow. The choice of the number of layers, their sizes and their activation functions is up to you.

In [0]:
mlp = MLP(x_train.shape[-1])

########################################
#     Put your implementation here     #
########################################

Let's set some hyper-parameters. Feel free to change these hyper-parameters however you see fit!

In [0]:
epochs = 10
Batch_size = 1024
sgd_lr = 0.1
sgd_momentum = 0.9

Now let's train the network!

In [0]:
from tqdm import tqdm_notebook
### Defining a optimizer
optimizer = SGD(lr=sgd_lr, momentum=sgd_momentum)

train_loss, val_loss, train_accs, val_accs = [], [], [], []

for i in range(epochs):
  mini_batches = get_mini_batches(x_train, y_train, Batch_size)
  for xx, yy in tqdm_notebook(mini_batches, desc='epoch {}'.format(i+1)):

    ### forward propagation
    mlp.feed_forward(xx)
    ### compute gradients
    grads = mlp_gradients(mlp, softmax_categorical_cross_entropy, xx, yy)
    ### optimization
    mlp.parameters = optimizer.step(mlp.parameters, grads)
    
  y_hat = mlp.feed_forward(x_train)
  y_hat_val = mlp.feed_forward(x_val)
  val_loss.append(softmax_categorical_cross_entropy(y_val, y_hat_val))
  train_loss.append(softmax_categorical_cross_entropy(y_train, y_hat))
  train_acc = accuracy(y_train, y_hat)*100
  val_acc = accuracy(y_val, y_hat_val)*100
  train_accs.append(train_acc)
  val_accs.append(val_acc)
  print("training acc: {:.2f} %".format(train_acc))
  print("test acc: {:.2f} %".format(val_acc))

Let's visualize accuracy and loss for train and validation sets during training:

In [0]:
plt.plot(list(range(len(train_loss))), train_loss, label='train')
plt.plot(list(range(len(val_loss))), val_loss, label='val')
plt.xlabel('epoch')
plt.ylabel('loss')
plt.legend()
plt.show()
In [0]:
plt.plot(list(range(len(train_accs))), train_accs, label='train')
plt.plot(list(range(len(val_accs))), val_accs, label='val')
plt.xlabel('epoch')
plt.ylabel('accuracy')
plt.legend()
plt.show()

Question: Looking at loss and accuracy plots, how would you describe your model in terms of bias and variance?

Write your answers here

In [0]:
 

Submission

Congratulations! You finished the assignment & you're ready to submit your work. Please follow the instructions:

  1. Check and review your answers. Make sure all of the cell outputs are what you want.
  2. Select File > Save.
  3. Run Make Submission cell, It may take several minutes and it may ask you for your credential.
  4. Run Download Submission cell to obtain your submission as a zip file.
  5. Grab the downloaded file (dl_asg01__xx__xx.zip) and upload it via https://forms.gle/2dogVcZhfBvBC1aM6

Note: We need your Github token to create (if doesn't exist previously) new repository to store learned model data. Also Google Drvie token enable us to download current notebook & create submission. If you are intrested feel free to check our code.

Make Submission (Run the cell)

In [0]:
#@title
! pip install -U --quiet PyDrive > /dev/null
! wget -q https://github.com/github/hub/releases/download/v2.10.0/hub-linux-amd64-2.10.0.tgz 
  
import os
import time
import yaml
import json

from google.colab import files
from IPython.display import Javascript
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials

asg_name = 'assignment_1'
script_save = '''
require(["base/js/namespace"],function(Jupyter) {
    Jupyter.notebook.save_checkpoint();
});
'''
repo_name = 'iust-deep-learning-assignments'
submission_file_name = 'dl_asg01__%s__%s.zip'%(student_id, student_name.lower().replace(' ',  '_'))

! tar xf hub-linux-amd64-2.10.0.tgz
! cd hub-linux-amd64-2.10.0/ && chmod a+x install && ./install
! hub config --global hub.protocol https
! hub config --global user.email "$Your_Github_account_Email"
! hub config --global user.name "$student_name"
! hub api --flat -X GET /user
! hub api -F affiliation=owner -X GET /user/repos > repos.json

repos = json.load(open('repos.json'))
repo_names = [r['name'] for r in repos]
has_repository = repo_name in repo_names
if not has_repository:
  get_ipython().system_raw('! hub api -X POST -F name=%s /user/repos > repo_info.json' % repo_name)
  repo_info = json.load(open('repo_info.json')) 
  repo_url = repo_info['clone_url']
else:
  for r in repos:
    if r['name'] == repo_name:
      repo_url = r['clone_url']
  
stream = open("/root/.config/hub", "r")
token = list(yaml.load_all(stream))[0]['github.com'][0]['oauth_token']
repo_url_with_token = 'https://'+token+"@" +repo_url.split('https://')[1]

! git clone "$repo_url_with_token"
! cp -r "$ASSIGNMENT_PATH" "$repo_name"/
! cd "$repo_name" && git add -A
! cd "$repo_name" && git commit -m "Add assignment 01 results"
! cd "$repo_name" && git push -u origin master

sub_info = {
    'student_id': student_id,
    'student_name': student_name, 
    'repo_url': repo_url,
    'asg_dir_contents': os.listdir(str(ASSIGNMENT_PATH)),
    'dateime': str(time.time()),
    'asg_name': asg_name
}
json.dump(sub_info, open('info.json', 'w'))

Javascript(script_save)

auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
file_id = drive.ListFile({'q':"title='%s.ipynb'"%asg_name}).GetList()[0]['id']
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('%s.ipynb'%asg_name) 

! jupyter nbconvert --to script "$asg_name".ipynb > /dev/null
! jupyter nbconvert --to html "$asg_name".ipynb > /dev/null
! zip "$submission_file_name" "$asg_name".ipynb "$asg_name".html "$asg_name".txt info.json > /dev/null

print("##########################################")
print("Done! Submisson created, Please download using the bellow cell!")

Download Submission (Run the cell)

In [0]:
files.download(submission_file_name)