Please pay attention to these notes:
########################################
# Put your implementation here #
########################################
Assignment Page: https://iust-deep-learning.github.io/972/assignments/02_tuning
Course Forum: https://groups.google.com/forum/#!forum/dl972/
Fill your information here & run the cell
#@title Enter your information & "RUN the cell!!" { run: "auto" }
student_id = 0 #@param {type:"integer"}
student_name = "" #@param {type:"string"}
Your_Github_account_Email = "" #@param {type:"string"}
print("your student id:", student_id)
print("your name:", student_name)
from pathlib import Path
ASSIGNMENT_PATH = Path('asg02')
ASSIGNMENT_PATH.mkdir(parents=True, exist_ok=True)
PlayerUnknown's Battlegrounds (PUBG) is a popular online survival multiplayer game. In this game, players are dropped into a wide, open area, and they must fight to the death using a variety of interesting weapons and vehicles while avoiding getting killed themselves. The last player or team standing wins the round. Although it's not necessary, but you can learn about other aspects of the game easily by searching the web, since the game is very popular and well known.
Kaggle is a platform to compete with others in competitions which are based on machine learning tasks. Most of the time you are given some training and testing datasets for a specific task to build some good machine learning models.
In this assignment, you will participate in one of these competitions which is realted to PUBG. See this link for more details.
Let's download the sampled dataset (100K)
! wget -q https://iust-deep-learning.github.io/972/static_files/assignments/asg02_assets/data.tar.gz
! tar xvfz data.tar.gz
Then, load the dataset
import pandas as pd
train = pd.read_csv('train.csv')
valid = pd.read_csv('valid.csv')
train
As you can see, the training dataset consists of lots of different features for each instance, choose some arbitary numer of these features (at least three) which you think they are better for training. Explain how did you find them and why do you think this way?
$\color{red}{\text{Write your answer here}}$
Build and train a simple feed forward neural network regressor using your selected features to predict the desired outcome (player's final percentile rank). Choosing the number of layers, activasion functions, the optimizer, representation of input data and hyper parameters are completely up to you. Also feel free to add any new cells, functions, and classes if you want.
Import the dependencies
from keras.models import Sequential, load_model
Model Implementation
model = Sequential()
# Go on and use whatever MLP architecture you want
# Layers, and number of them is Totally up to you
########################################
# Put your implementation here #
########################################
Save the model to disk
# Remember to run this cell after each time you update the model,
# this is one of deliverable items of your assignemnt
model.save(str(ASSIGNMENT_PATH / 'potg.h5'))
In order to evaluate your model, we need you to fill the following function. Remember, all features are present in the input file, so you must choose your selected features, do all the requiered pre processing, feed your trained model with the result and finaly give us your predictions in a list.
Note: We'll run your model on a hidden test set using this function to measure its performance.
def predict(x):
"""
Predict the placement of a player.
Args:
x (list[tuple()]): A list of players. Each player is a tuple(Id, groupId, matchId,
assists, boosts, damageDealt, DBNOs, headshotKills, heals, killPlace, killPoints,
kills, killStreaks, longestKill, matchDuration, matchType, maxPlace, numGroups,
rankPoints, revives, rideDistance, roadKills, swimDistance, teamKills, vehicleDestroys,
walkDistance, weaponsAcquired, winPoints)
Returns:
pred (list[float]): contains the placement prediction for each element in the input list.
predictions are of 0-1 range where 1 corresponds to 1st place, and 0 corresponds
to last place in the match
"""
m = load_model(str(ASSIGNMENT_PATH / 'potg.h5'))
pred = []
# Do all of the preprocessing here,
# you can use any combination of features you want.
########################################
# Put your implementation here #
########################################
assert isinstance(pred, list)
assert len(pred) == len(x)
assert all([isinstance(p, float) for p in pred])
return pred
By using a neural network, we try to approximate a function for different purposes. In the training process, we want to maximize accuracy, while minimizing the error rate. However, there might be some problems with the model that we train.
One of the problems with deep neural networks is that they perform poorly in some cases. This poor performance might have different reasons. As you see in the pictures below, the problem might be due to the function we use, which might be too simple for such a task (underfitting) or too complex (overfitting).
In the above pictures, the left graphs demonstrate an underfitted function that performs poorly on the task. This function has a high bias. On the other hand, the right graphs have a low bias and high variance. These are too-compilicated functions that have unnecessarily learned noisy details of the training set. To better understand this concept, let us explain what we mean by bias and variance.
Bias is the difference between the average prediction of our model and the correct value which we are trying to predict. Model with high bias pays very little attention to the training data and oversimplifies the model. It always leads to high error on training and test data.
Variance is the variability of model prediction for a given data point or a value which reflects the spread of our data. Model with high variance pays a lot of attention to training data and does not generalize on the data which it hasn’t seen before. As a result, such models perform very well on training data but have high error rates on test data.
In our models, we should try to balance the tradeoff between bias and variance so that the model can perform well on the test set. Overfitting is a more common problem in the training process and one way of recognizing it is by looking at the learning curves. In the below graph, the red curve is the validation error and the blue curve is the training error per each epoch of learning. The indication for the start of overfitting on the training set is that training error start to decline whereas the validation error increases.
One way to overcome overfitting is through regularization. There are different regularization methods such as L2 or L1 regularization. In regularization, we add an extra term to the loss function of the neural network. This extra term could be L2 norm of weight matrices or their L1 norm. So, the cost function will be similar the following equation:
\begin{equation*} Cost function = Loss + \frac{\lambda}{ 2m} \sum_{i} \sum_{j} \left \lvert\lvert w_{i, j} \right \rvert\rvert^{2}_{F} \end{equation*}Instead of L2 norm, we can use L1 norm or a linear combination of each one. We can compute L1 norm and a linear combination of L1 and L2 norm using the following equations:
\begin{equation*} Cost function = Loss + \frac{\lambda}{ 2m} \sum_{i} \sum_{j} \left \lvert\lvert w_{i, j} \right \rvert\rvert \end{equation*}\begin{equation*} Cost function = Loss + \alpha( \frac{\lambda_{l2}}{ 2m} \sum_{i} \sum_{j} \left \lvert\lvert w_{i, j} \right \rvert\rvert^{2}_{F} ) + (1 - \alpha) (\frac{\lambda_{l1}}{ 2m} \sum_{i} \sum_{j} \left \lvert\lvert w_{i, j} \right \rvert\rvert ) \end{equation*}And $\alpha$ could be any real number between 0 and 1.
Adding the L1 norm to the cost function forces the weights to be close to zero, and this will lead to sparse weight matrices. This sparsity helps us to overcome the overfitting problem, because it limits the domain of possible values for weights and this prevents the function to be very compilicated.
In this assignment, we want to learn how to use regularization techniques in Keras and how they will affect weight matrices. In Keras, we can use regularizations for weight matrices, biases, and activations. To use regularization techniques, you should, for each layer, specify weather you want to use L1, L2, or a combination of them.
from keras import regularizers
Dense(64, kernel_regularizer=regularizers.l2(0.01), bias_regularizer=regularizers.l1(0.01), activity_regularizer=regularizers.l1_l2(l1=0.01, l2=0.01))
As you see in the above code, we can easily use the regularization techniques in each of the layers. You can set regularizer for weight matrix, bias vector, and activations of a layer by using the kernel_regularizer, bias_regularizer, activity_regularizer
parameters, respectively. As you see, we used an L2 norm with $\lambda = 0.01$ to penalize the weight matrix, an L1 norm to penalize the bias vector, and a combination of L1 and L2 norms to penalize activations.
In Keras, we can also use other custom regularization approaches (which may not have been implemented in the framework). To implement and use a new regularization method, we should define a method like the following code and then pass it to the layer. In the following code, we implemented the L1 norm.
from keras import backend as K
def l1_reg(weight_matrix):
return 0.01 * K.sum(K.abs(weight_matrix))
model.add(Dense(64, input_dim=64, kernel_regularizer=l1_reg))
Questions
We would like to train a neural network that learns to classify the data that in the following graph.
def get_data(nb_samples_per_class):
mean = [0, 0]
cov = [[.01, 0], [.014, 0.05]]
x, y = np.random.multivariate_normal(mean, cov, nb_samples_per_class).T
mean = [.4, .1]
cov = [[0.01, .01], [.04, .01]]
x1, y1 = np.random.multivariate_normal(mean, cov, nb_samples_per_class).T
d1 = [[i, j, 1] for i, j in zip(x, y)]
d2 = [[i, j, 0] for i, j in zip(x1, y1)]
data = np.array(d1 + d2)
np.random.shuffle(data)
return data
data = get_data(100)
plt.scatter(data[:, 0], data[:, 1], c=data[:, 2].ravel(), cmap=plt.cm.Spectral)
plt.show()
Training
We trained the following model, and then, plotted the decision boundaries. Decision boundaries have been presented in the following graph.
As you see, the approximated function is a very complex one that concentrates on the training set and cannot generalize well on the test test.
# For a single-input model with 2 classes (binary classification):
model = Sequential()
model.add(Dense(32, activation='relu', input_dim=2))
model.add(Dense(64, activation='relu'))
model.add(Dense(128, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.compile(optimizer='rmsprop', loss='binary_crossentropy', metrics=['accuracy'])
# Train the model, iterating on the data in batches of 32 samples
val_data = get_data(30)
callback = model.fit(data[:, :2], data[:, 2], validation_data=(val_data[:, :2], val_data[:, 2]), epochs=2000, batch_size=32, verbose=0)
def plot_decision_boundary(model, X, y):
y = np.around(y)
# Set min and max values and give it some padding
x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
h = .01
# Generate a grid of points with distance h between them
xx, yy = np.meshgrid(np.arange(x_min, x_max, h), np.arange(y_min, y_max, h))
# Predict the function value for the whole grid
Z = np.around(model(np.c_[xx.ravel(), yy.ravel()]))
Z = Z.reshape(xx.shape)
# Plot the contour and training examples
plt.contourf(xx, yy, Z, cmap=plt.cm.Spectral)
plt.ylabel('x2')
plt.xlabel('x1')
plt.scatter(X[:, 0], X[:, 1], c=y.ravel(), cmap=plt.cm.Spectral)
plt.show()
label = model.predict(data[:, :2])
plot_decision_boundary(lambda x : model.predict(x), data[:, :2], label)
model.fit()
method returns a callback that contains history of the learning process. You can access the loss and accuracy of the model on training and validation sets in each epoch of learning.
callback.history['acc']
callback.history['loss']
callback.history['val_acc']
callback.history['val_loss']
Questions:
Note: In the following questions, whenever we mentioned learning curves, we mean three graphs. These three graphs indicate accuracy of model per epoch, error rate of the model per epoch, and the value of network loss for training, validation, and test sets.
Note: You can use plot_decision_boundary
method to plot the decision boundaries.
Note: If learning curves oscillate, you can use moving average to smooth them. Use the following code.
def moving_avg(mist):
N = 30
cumsum, moving_aves = [0], []
for i, x in enumerate(mlist, 1):
cumsum.append(cumsum[i - 1] + x)
if i >= N:
moving_ave = (cumsum[i] - cumsum[i - N]) / N
# can do stuff with moving_ave here
moving_aves.append(moving_ave)
return moving_aves
1) Plot learning curves and point out the approximate epoch that the model started to overfit on the data.
########################################
# Put your implementation here #
########################################
2) Apply L1 and L2, separately, on all of the layers (just for weight matrices) and plot the learning curves and decision boundaries. Test it with three different values for $\lambda$ ($\lambda \in \{0.1, 0.01, 0.0001\} $). Which values work better? Why?
########################################
# Put your implementation here #
########################################
$\color{red}{\text{Write you answer here}}$
3) Now, apply the L1 and L2 on biases and compare with the result of last question (compare each $\lambda$ separately). Which one works better? Why?
########################################
# Put your implementation here #
########################################
$\color{red}{\text{Write you answer here}}$
4) Implement a linear combination of L1 and L2 norm and test it for three different value of $\alpha$ ($\alpha \in \{0.3, 0.5, 0.7\} $).
########################################
# Put your implementation here #
########################################
5) Compare the results of questions 2 and 4 for each value of $\alpha$ separately (one plot for each value of $\alpha$ that contains learning curves for L1, L2, and linear combination of them). $\lambda = 0.01$
$\color{red}{\text{Write you answer here}}$
6) Try to prevent the overfitting by adding the regularization term to each layer of the network, and then, plot the decision boundaries and learning curves. Add each of the regularization techniques seperatly and compare them with eachother. $\lambda = 0.01$
########################################
# Put your implementation here #
########################################
7) Run you implemented code for question 2 with $\lambda = 0.01$ multiple times. Which regularization technique is stable? (By stable, we mean a model that prevents the overfitting all the time)
########################################
# Put your implementation here #
########################################
$\color{red}{\text{Write you answer here}}$
Run the following block to import requirements
%matplotlib inline
import numpy as np
import keras
import keras.backend as K
from keras import optimizers
from keras.models import Model
from keras.layers import Input, Dense
import matplotlib.pyplot as plt
import matplotlib.animation as animation
plt.ioff()
Consider this structure: a smiple model with a single node in input layer and a single node in output dense layer (with use_bias
argument set to False
). This way, if we set the input to the constant value 1, the output will always be equal to the single weight variable between the input node and the output node.
Using this technique, we can define a custom arbitary function and find its minimum value using predefined optimizer methods in keras.
See the following code for better underestanding:
def minimize (config):
'''
The wrapper function which makes
the custom fucntion suitable for
model.compile method
'''
def custom_loss(layer):
'''
custom function f(x) = x**2
Notice that y_pred value is exactly
equal to our single weight value as
explained before. Also Notice that
y_ture value dosn't actually play
any rule in defined function, but
it needs to be passed.
'''
def loss(y_true,y_pred):
# in order to change objective function, this line must be changed
return ((y_pred-1)**2)*(y_pred+1)*(y_pred**2-3)*(y_pred-4)/90.0
return loss
# Creating single input single output model
init_vals = config['init_vals']
inp = Input(shape=(1,))
weights = Dense(1, use_bias=False)
out = weights(inp)
model = Model (inputs=inp, outputs=out)
weights.set_weights([np.array([init_vals])])
model.compile (optimizer=config['optimizer'], loss=custom_loss(out))
# Storing w1 (our single weight) values
# during training for later plotting.
w1_history = [init_vals[0]]
for epoch in range(config['epochs']):
# Notice the constant 1 input value.
# Also Notice that the output value
# passed to fit method dosn't really
# matter, however it can not be None
# and needs to be passed.
model.fit (x=[1.0], y=[1.0], epochs= 1, verbose=0);
w1_history.append (weights.get_weights()[0][0][0])
return w1_history
Using this piece of code, we can visualize the optimizer's steps for minimizing the
def visualize(independent_variable_history):
fig = plt.figure(figsize = (4,4))
X = np.linspace(-2.1, 4.1, 200)
'''
@@ in order to change objective function, this line
must be changed
'''
Y = ((X-1)**2)*(X+1)*(X**2-3)*(X-4)/90.0
def ani(coords):
plt.cla()
plt.plot(X, Y, "b")
return plt.plot([coords[0]],[coords[1]], 'go')
def frames():
for x in independent_variable_history:
'''
@@ in order to change objective function, this line
must be changed
'''
yield x, ((x-1)**2)*(x+1)*(x**2-3)*(x-4)/90.0
from IPython.display import HTML
return HTML(animation.FuncAnimation(fig, ani, frames=frames, interval=30).to_jshtml())
You can use these codes by passing a configuration dictionary like this:
config = {
"init_vals": [-2.0],
"optimizer": optimizers.SGD(lr=0.1, decay=1e-6, momentum=0.9),
"epochs" : 200
}
independent_variable_history = minimize(config)
visualize(independent_variable_history)
$\color{red}{\text{Write you answer here}}$
#### FIRST LOCAL MINIMUM
config = {
"init_vals": [-2.0],
"optimizer": optimizers.SGD(lr= ? , decay= ? , momentum= ?),
"epochs" : 200
}
independent_variable_history = minimize(config)
visualize(independent_variable_history)
#### SECOND LOCAL MINIMUM
config = {
"init_vals": [-2.0],
"optimizer": optimizers.SGD(lr= ? , decay= ? , momentum= ?),
"epochs" : 200
}
independent_variable_history = minimize(config)
visualize(independent_variable_history)
$\color{red}{\text{Write you answer here}}$
$\color{red}{\text{Write you answer here}}$
$\color{red}{\text{Write you answer here}}$
Congratulations! You finished the assignment & you're ready to submit your work. Please follow the instruction:
dl_asg02__xx__xx.zip
) and submit it via https://goo.gl/forms/ShToQRtEp32n8YHq2.Note: We need your Github token to create (if doesn't exist previously) new repository to store learned model data. Also Google Drvie token enable us to download current notebook & create submission. If you are intrested feel free to check our code.
#@title
! pip install -U --quiet PyDrive > /dev/null
! wget -q https://github.com/github/hub/releases/download/v2.10.0/hub-linux-amd64-2.10.0.tgz
import os
import time
import yaml
import json
from google.colab import files
from IPython.display import Javascript
from pydrive.auth import GoogleAuth
from pydrive.drive import GoogleDrive
from google.colab import auth
from oauth2client.client import GoogleCredentials
asg_name = 'assignment_02'
script_save = '''
require(["base/js/namespace"],function(Jupyter) {
Jupyter.notebook.save_checkpoint();
});
'''
repo_name = 'iust-deep-learning-assignments'
submission_file_name = 'dl_asg02__%s__%s.zip'%(student_id, student_name.lower().replace(' ', '_'))
! tar xf hub-linux-amd64-2.10.0.tgz
! cd hub-linux-amd64-2.10.0/ && chmod a+x install && ./install
! hub config --global hub.protocol https
! hub config --global user.email "$Your_Github_account_Email"
! hub config --global user.name "$student_name"
! hub api --flat -X GET /user
! hub api -F affiliation=owner -X GET /user/repos > repos.json
repos = json.load(open('repos.json'))
repo_names = [r['name'] for r in repos]
has_repository = repo_name in repo_names
if not has_repository:
get_ipython().system_raw('! hub api -X POST -F name=%s /user/repos > repo_info.json' % repo_name)
repo_info = json.load(open('repo_info.json'))
repo_url = repo_info['clone_url']
else:
for r in repos:
if r['name'] == repo_name:
repo_url = r['clone_url']
stream = open("/root/.config/hub", "r")
token = list(yaml.load_all(stream))[0]['github.com'][0]['oauth_token']
repo_url_with_token = 'https://'+token+"@" +repo_url.split('https://')[1]
! git clone "$repo_url_with_token"
! cp -r "$ASSIGNMENT_PATH" "$repo_name"/
! cd "$repo_name" && git add -A
! cd "$repo_name" && git commit -m "Add assignment 02 results"
! cd "$repo_name" && git push -u origin master
sub_info = {
'student_id': student_id,
'student_name': student_name,
'repo_url': repo_url,
'asg_dir_contents': os.listdir(str(ASSIGNMENT_PATH)),
'dateime': str(time.time()),
'asg_name': asg_name
}
json.dump(sub_info, open('info.json', 'w'))
Javascript(script_save)
auth.authenticate_user()
gauth = GoogleAuth()
gauth.credentials = GoogleCredentials.get_application_default()
drive = GoogleDrive(gauth)
file_id = drive.ListFile({'q':"title='%s.ipynb'"%asg_name}).GetList()[0]['id']
downloaded = drive.CreateFile({'id': file_id})
downloaded.GetContentFile('%s.ipynb'%asg_name)
! jupyter nbconvert --to script "$asg_name".ipynb > /dev/null
! jupyter nbconvert --to html "$asg_name".ipynb > /dev/null
! zip "$submission_file_name" "$asg_name".ipynb "$asg_name".html "$asg_name".txt info.json > /dev/null
print("##########################################")
print("Done! Submisson created, Please download using the bellow cell!")
#@title
files.download(submission_file_name)