Deep Learning Lab.¶

In this lab we will continue working with the CIFAR-10 dataset. However, we will go deeper. Adding linear layers and non-linear activations functions on top of each other. First, I will present a re-implementation of what we had last time.

1. Implementing our own Softmax + CrossEntropyLoss function.¶

This is similar to the loss_softmax and loss_softmax_backward implementations in the previous lab. Here we also make sure this works for a batch of vectors instead of a single vector. This means the input here will be a tensor of size batchSize x inputSize:

import torch, lab_utils, random
from torchvision.datasets import CIFAR10 
import torchvision.transforms as transforms
from torch.autograd import Variable
import torch.nn as nn 
import torch.optim as optim
import torchvision.models as models
import torchvision.transforms as transforms
from torch.autograd import Variable
import torch.nn as nn 
import torch.nn.functional as F
import torch.optim as optim
import matplotlib.pyplot as plt
from PIL import Image
import json, string
%matplotlib inline

# This class combines Softmax + Cross Entropy Loss.
# Similar to the previous lab, but this implementation works for batches of inputs and
# not just individual input vectors. Here the input is batchSize x inputSize.
class nn_CrossEntropyLoss(object): 
    # Forward pass -log softmax(input_{label})
    def forward(self, inputs, labels):
        max_val = inputs.max()  # This is to avoid variable overflows.
        exp_inputs = (inputs - max_val).exp()
        # This is different than in the previous lab. Avoiding for loops here.
        denominators = exp_inputs.sum(1).repeat(inputs.size(1), 1).t()
        self.predictions = torch.mul(exp_inputs, 1 / denominators)
        # Check what gather does. Just avoiding another for loop.
        return -self.predictions.log().gather(1, labels.view(-1, 1)).mean()
    
    # Backward pass 
    def backward(self, inputs, labels):
        grad_inputs = self.predictions.clone()
        # Ok, Here we will use a for loop (but it is avoidable too).
        for i in range(0, inputs.size(0)):
            grad_inputs[i][labels[i]] = grad_inputs[i][labels[i]] - 1
        return grad_inputs 

# Input: 4 vectors of size 10.
testInput = torch.Tensor(4, 10).normal_(0, 0.1)
# labels: 4 labels indicating the correct class for each input.
labels = torch.LongTensor([3, 4, 4, 8])

# Forward and Backward passes:
loss_softmax = nn_CrossEntropyLoss()
loss = loss_softmax.forward(testInput, labels)
gradInputs = loss_softmax.backward(testInput, labels)

Before continuing, make sure you understand every line of code in the above implementation by looking at previous lecture notes.

2. Implementing our own Linear layer.¶

Next we provide an implementation for a linear layer that is also meant to work on batches of vetors. Notice that in addition of computing gradWeight and gradBias, we require here gradInput as we might need this gradient to do backpropagation. Making a batched implementation of this layer is easier because the only change is that now we have matrix-matrix multiplications as opposed to vector-matrix multiplications.

class nn_Linear(object):
    def __init__(self, inputSize, outputSize):
        self.weight = torch.Tensor(inputSize, outputSize).normal_(0, 0.01)
        self.gradWeight = torch.Tensor(inputSize, outputSize)
        self.bias = torch.Tensor(outputSize).zero_()
        self.gradBias = torch.Tensor(outputSize)
    
    # Forward pass, inputs is a matrix of size batchSize x inputSize
    def forward(self, inputs):
        # This one needs no change, it just becomes matrix x matrix multiplication
        # as opposed to just vector x matrix multiplication as we had before.
        return torch.matmul(inputs, self.weight) + self.bias
    
    # Backward pass, in addition to compute gradients for the weight and bias.
    # It has to compute gradients with respect to inputs. 
    def backward(self, inputs, gradOutput):
        self.gradWeight = torch.matmul(inputs.t(), gradOutput)
        self.gradBias = gradOutput.sum(0)
        return torch.matmul(gradOutput, self.weight.t())

# Input: 4 vectors of size 3072.
testInput = torch.Tensor(4, 3 * 32 * 32).normal_(0, 0.1)
dummyGradOutputs = torch.Tensor(4, 10).normal_(0, 0.1)

#Forward and Backward passes:
linear = nn_Linear(3 * 32 * 32, 10)
output = linear.forward(testInput)
gradInput = linear.backward(testInput, dummyGradOutputs)

3. Implementing an "Activation" function, or non-linearity.¶

Finally we need to implement some non-linear activation function. Here we will implement ReLU which is the simplest activation function but also one of the most important as we discussed during class.

class nn_ReLU(object):
    # pytorch has an element-wise max function.
    def forward(self, inputs):
        outputs = inputs.clone()
        outputs[outputs < 0] = 0
        return outputs
    
    # Make sure the backward pass is absolutely clear.
    def backward(self, inputs, gradOutput):
        gradInputs = gradOutput.clone()
        gradInputs[inputs < 0] = 0
        return gradInputs

4. Implementation I: CIFAR-10 Neural network classification using our implementations.¶

Ok, now we are ready to use our three layers to build a neural network. We will use it to classify images on CIFAR-10 as in our previous lab, but additionally we will use pytorch's DataLoaders which will build batches automatically for us, and will shuffle the data for us.

# In addition to transforming the image into a tensor, we also normalize the values in the image
# so that the mean pixel value is subtracted and divided by the pixel standard deviation.
imgTransform = transforms.Compose([transforms.ToTensor(),
                                   transforms.Normalize((0.4914, 0.4822, 0.4465), 
                                                        (0.2023, 0.1994, 0.2010)),
                                   transforms.Lambda(lambda inputs: inputs.view(3 * 32 * 32))])
trainset = CIFAR10(root='./data', train = True, transform = imgTransform, download = True)
valset = CIFAR10(root='./data', train = False, transform = imgTransform, download = True)

trainLoader = torch.utils.data.DataLoader(trainset, batch_size = 128, 
                                          shuffle = True, num_workers = 0)
valLoader = torch.utils.data.DataLoader(valset, batch_size = 128, 
                                        shuffle = False, num_workers = 0)

Now that the dataset train, and validation splits are loaded, let's train.

from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm

learningRate = 1e-4  # Single learning rate for this lab.

# Definition of our network.
linear1 = nn_Linear(3 * 32 * 32, 1024)
relu = nn_ReLU()
linear2 = nn_Linear(1024, 10)
criterion = nn_CrossEntropyLoss()

# Training loop.
for epoch in range(0, 10):
    correct = 0.0
    cum_loss = 0.0
    counter = 0
    
    # Make a pass over the training data.
    t = tqdm(trainLoader, desc = 'Training epoch %d' % epoch)
    for (i, (inputs, labels)) in enumerate(t):
    
        # Forward pass:
        a = linear1.forward(inputs)
        b = relu.forward(a)
        c = linear2.forward(b)
        cum_loss += criterion.forward(c, labels)
        max_scores, max_labels = c.max(1)
        correct += (max_labels == labels).sum()
        
        # Backward pass:
        grads_c = criterion.backward(c, labels)
        grads_b = linear2.backward(b, grads_c)
        grads_a = relu.backward(a, grads_b)
        linear1.backward(inputs, grads_a)
        
        # Weight and bias updates.
        linear1.weight = linear1.weight - learningRate * linear1.gradWeight
        linear1.bias = linear1.bias - learningRate * linear1.gradBias
        linear2.weight = linear2.weight - learningRate * linear2.gradWeight
        linear2.bias = linear2.bias - learningRate * linear2.gradBias
        
        # logging information.
        counter += inputs.size(0)
        t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)
    
    # Make a pass over the validation data.
    correct = 0.0
    cum_loss = 0.0
    counter = 0
    t = tqdm(valLoader, desc = 'Validation epoch %d' % epoch)
    for (i, (inputs, labels)) in enumerate(t):
        
        # Forward pass:
        a = linear1.forward(inputs)
        b = relu.forward(a)
        c = linear2.forward(b)
        cum_loss += criterion.forward(c, labels)
        max_scores, max_labels = c.max(1)
        correct += (max_labels == labels).sum()
        
        # logging information.
        counter += inputs.size(0)
        t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)

5. Implementation II: CIFAR-10 neural network classification using pytorch's nn functions.¶

Pytorch already comes with an impressive number of operations used to implement deep neural networks. Here we will use the same ones that we already have implemented and show how similar and easy is to use pytorch's implementations. Another thing about pytorch is that we will wrap our variables in a neural network with a torch.autograd.Variable object.

from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm

learningRate = 1e-2  # Single learning rate for this lab.

# Definition of our network.
network = nn.Sequential(
    nn.Linear(3072, 1024),
    nn.ReLU(),
    nn.Linear(1024, 10),
)
#Definition of our loss.
criterion = nn.CrossEntropyLoss()

# Definition of optimization strategy.
optimizer = optim.SGD(network.parameters(), lr = learningRate)

def train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 10):
    # Training loop.
    for epoch in range(0, n_epochs):
        correct = 0.0
        cum_loss = 0.0
        counter = 0

        # Make a pass over the training data.
        t = tqdm(trainLoader, desc = 'Training epoch %d' % epoch)
        network.train()  # This is important to call before training!
        for (i, (inputs, labels)) in enumerate(t):

            # Wrap inputs, and targets into torch.autograd.Variable types.
            inputs = Variable(inputs)
            labels = Variable(labels)

            # Forward pass:
            outputs = network(inputs)
            loss = criterion(outputs, labels)

            # Backward pass:
            optimizer.zero_grad()
            # Loss is a variable, and calling backward on a Variable will
            # compute all the gradients that lead to that Variable taking on its
            # current value.
            loss.backward() 

            # Weight and bias updates.
            optimizer.step()

            # logging information.
            cum_loss += loss.data[0]
            max_scores, max_labels = outputs.data.max(1)
            correct += (max_labels == labels.data).sum()
            counter += inputs.size(0)
            t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)

        # Make a pass over the validation data.
        correct = 0.0
        cum_loss = 0.0
        counter = 0
        t = tqdm(valLoader, desc = 'Validation epoch %d' % epoch)
        network.eval()  # This is important to call before evaluating!
        for (i, (inputs, labels)) in enumerate(t):

            # Wrap inputs, and targets into torch.autograd.Variable types.
            inputs = Variable(inputs)
            labels = Variable(labels)

            # Forward pass:
            outputs = network(inputs)
            loss = criterion(outputs, labels)

            # logging information.
            cum_loss += loss.data[0]
            max_scores, max_labels = outputs.data.max(1)
            correct += (max_labels == labels.data).sum()
            counter += inputs.size(0)
            t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)  
            
# Train the previously defined model.
train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 10)

5. Implementation III: CIFAR-10 neural network classification using pytorch's autograd magic!¶

Objects of type torch.autograd.Variable contain two attributes .data and .grad, the first one, .data, contains the value of the variable at any given point, and .grad contains the value of the gradient of this variable once a backward call involving this variable has been invoked. In the previous code, we have to take into account that most torch tensor operations that can be applied to tensors, can also be applied to tensors wrapped into torch.autograd.Variables. The output of torch operations involving variables will also be a torch.autograd.Variable (as opposed to just a tensor). Another difference is that pytorch will record the operations on each torch.autograd.Variable in a graph structure so that gradients can be computed when a backward() call is performed on any variable in the graph. This very powerful technique is often called "automatic differentiation". This means that as long as we wrap tensors in variables, and use pytorch operators, we do not really need to implement backward passes.

from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm
from torch.autograd import Variable
import torch.nn as nn 
import torch.nn.functional as F
import torch.optim as optim

learningRate = 1e-2  # Single learning rate for this lab.

class MyAutogradModel(nn.Module):
    def __init__(self):
        super(MyAutogradModel, self).__init__()
        # See documentation for nn.Parameter here:
        # https://github.com/pytorch/pytorch/blob/master/torch/nn/parameter.py
        self.weight1 = nn.Parameter(torch.Tensor(3072, 1024).normal_(0, 0.01))
        self.bias1 = nn.Parameter(torch.Tensor(1024).zero_())
        self.weight2 = nn.Parameter(torch.Tensor(1024, 10).normal_(0, 0.01))
        self.bias2 = nn.Parameter(torch.Tensor(10).zero_())
        
    # No need to implement backward when using torch.autograd.Variable and pytorch functions.
    # Think of the possibilities!
    def forward(self, inputs):
        x = F.relu(torch.matmul(inputs, self.weight1) + self.bias1)
        x = torch.matmul(x, self.weight2) + self.bias2
        return x
        
# Definition of our network.
network = MyAutogradModel()

#Definition of our loss.
criterion = nn.CrossEntropyLoss()

# Definition of optimization strategy.
optimizer = optim.SGD(network.parameters(), lr = learningRate)

# Train the previously defined model.
train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 10)

6. Convolutional Neural Networks (using Pytorch nn's)¶

In this section we will use convolutional layers in addition to linear layers. Convolutional layers work on 2D input so we will modify our data loaders so that they return 2D images instead of the flattened array versions of the images that we have been using thus far.

# Same transformations as before but we do not vectorize the images.
imgTransform = transforms.Compose([transforms.ToTensor(),
                                   transforms.Normalize((0.4914, 0.4822, 0.4465), 
                                                        (0.2023, 0.1994, 0.2010))])
trainset = CIFAR10(root='./data', train = True, transform = imgTransform)
valset = CIFAR10(root='./data', train = False, transform = imgTransform)

trainLoader = torch.utils.data.DataLoader(trainset, batch_size = 128, 
                                          shuffle = True, num_workers = 0)
valLoader = torch.utils.data.DataLoader(valset, batch_size = 128, 
                                        shuffle = False, num_workers = 0)

Once data is loaded, now we proceed to define and train our neural network. Notice how we only need to change the definition of the model and not the way it is trained. This is just another of many advantages of training with a framework built on well engineered practices.

from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm

learningRate = 1e-2  # Single learning rate for this lab.

# LeNet is French for The Network, and is taken from Yann Lecun's 98 paper
# on digit classification http://yann.lecun.com/exdb/lenet/
# This was also a network with just two convolutional layers.
class LeNet(nn.Module):
    def __init__(self):
        super(LeNet, self).__init__()
        # Convolutional layers.
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        
        # Linear layers.
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        out = F.relu(self.conv1(x))
        out = F.max_pool2d(out, 2)
        out = F.relu(self.conv2(out))
        out = F.max_pool2d(out, 2)
        
        # This flattens the output of the previous layer into a vector.
        out = out.view(out.size(0), -1) 
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        return out
        
        
# Definition of our network.
network = LeNet()

#Definition of our loss.
criterion = nn.CrossEntropyLoss()

# Definition of optimization strategy.
optimizer = optim.SGD(network.parameters(), lr = learningRate)

# Train the previously defined model.
train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 20)

The last model achieved some more impressive numbers than the 40% we were obtaining in our previous lab by a large margin. The last model seems to be still improving, maybe training it for more epochs, or under a different learning rate, or reducing the learning rate after the first 20 epochs, could improve the accuracy further. We could try all these things. We should also from time to time, test our model on a few inputs and see how good it is becoming.

classes = ['airplane', 'automobile', 'bird', 'cat', 'deer',
           'dog', 'frog', 'horse', 'ship', 'truck']
un_normalize = lab_utils.UnNormalize((0.4914, 0.4822, 0.4465), 
                                     (0.2023, 0.1994, 0.2010))

network.eval()  # Important!

# Now predict the category using this trained classifier
for i in range(0, 5):
    img_id = random.randint(0, 10000)
    print('Image %d' % img_id)
    img, _ = valset[img_id]
    predictions = F.softmax(network(Variable(img.unsqueeze(0))))
    predictions = predictions.data

    # Show the results of the classifier.
    lab_utils.show_image(lab_utils.tensor2pil(un_normalize(img)).resize((128, 128)));
    max_score, max_label = predictions.max(1)
    print('Image predicted as %s with confidence %.2f' % (classes[max_label[0]], max_score[0]))

    # Print out detailed predictions.
    for (i, pred) in enumerate(predictions.squeeze().tolist()):
        print('y_hat[%s] = %.2f' % (classes[i], pred))

7. Pytorch's pretrained Convolutional Neural Networks.¶

Pytorch has several pretrained Convnet models in the Imagenet Large Scale Visual Recognition Challenge (ILSVRC) dataset. The ILSVRC task contains more than 1 million images in the training set, and the number of labels is 1000. Training a Convnet on this dataset takes often weeks on arrays of GPUs. Let's load one of such networks with 18 layers of depth, and try it in some images. Look below at how impressive is this neural network with so many layers and groups of layers, however most layers are still ReLU, Conv2d, and BatchNorm2d, with a few MaxPool2d, and one AvgPool2d and Linear at the end. There are also Resnet versions of depth size 34, 50, 101, and 152.

resnet = models.resnet18(pretrained = True)
print(resnet)

# 1. Define the appropriate image pre-processing function.
preprocessFn = transforms.Compose([transforms.Scale(256), 
                                   transforms.CenterCrop(224), 
                                   transforms.ToTensor(), 
                                   transforms.Normalize(mean = [0.485, 0.456, 0.406], 
                                                        std=[0.229, 0.224, 0.225])])

# 2. Load the imagenet class names.
imagenetClasses = {int(idx): entry[1] for (idx, entry) in json.load(open('imagenet_class_index.json')).items()}

# 3. Forward a test image of the toaster.
# Never forget to set in evaluation mode so Dropoff layers don't add randomness.
resnet.eval()
# unsqueeze(0) adds a dummy batch dimension which is required for all models in pytorch.
image = Image.open('test_image.jpg').convert('RGB')
# Try your own image here. This is a picture of my toaster at home.
inputVar =  Variable(preprocessFn(image).unsqueeze(0))
predictions = resnet(inputVar)

# 4. Decode the top 10 classes predicted for this image.
# We need to apply softmax because the model outputs the last linear layer activations and not softmax scores.
probs, indices = (-F.softmax(predictions)).data.sort()
probs = (-probs).numpy()[0][:10]; indices = indices.numpy()[0][:10]
preds = [imagenetClasses[idx] + ': ' + str(prob) for (prob, idx) in zip(probs, indices)]

# 5. Show image and predictions
plt.title(string.join(preds, '\n'))
plt.imshow(image);

8. Fine-tuning AlexNet on CIFAR-10¶

We will now use a pretrained network known as Alexnet on CIFAR-10 data, however there is a problem which is that Alexnet takes images in 224x224 resolution, and CIFAR-10 images are 32x32. So we will scale-up images in CIFAR-10 so that they work with Alexnet.

# Same transformations as before but we do not vectorize the images.
# Additionally we are scaling up images to 224x224 in order to use Resnet.
imgTransform = transforms.Compose([transforms.Scale((224, 224)),
                                   transforms.ToTensor(),
                                   transforms.Normalize((0.4914, 0.4822, 0.4465), 
                                                        (0.2023, 0.1994, 0.2010))])
trainset = CIFAR10(root='./data', train = True, transform = imgTransform)
valset = CIFAR10(root='./data', train = False, transform = imgTransform)

trainLoader = torch.utils.data.DataLoader(trainset, batch_size = 64, 
                                          shuffle = True, num_workers = 0)
valLoader = torch.utils.data.DataLoader(valset, batch_size = 64, 
                                        shuffle = False, num_workers = 0)

The code below will be extremely slow on a CPU, hours per epoch, and maybe a week to finish all epochs. For this part yu will need GPU nodes in the cloud (AWS, Google Cloud) or your own GPU. Another thing, is that GPUs do not have a lot of memory so batch size 128 is also not going to be possible.

from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm

learningRate = 1e-3  # Single learning rate for this lab.

# Definition of our network.
network = models.alexnet(pretrained = True)
# Also notice I'm replacing the classifier which originally has 3 linear layers
# into a classifier that is just a single layer.
network.classifier = nn.Linear(9216, 10)  # CIFAR-10 has 10 classes not 1000.

#Definition of our loss.
criterion = nn.CrossEntropyLoss()

# Definition of optimization strategy.
optimizer = optim.SGD(network.parameters(), lr = learningRate)

def train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 10, use_gpu = False):
    if use_gpu:
        network = network.cuda()
        criterion = criterion.cuda()
        
    # Training loop.
    for epoch in range(0, n_epochs):
        correct = 0.0
        cum_loss = 0.0
        counter = 0

        # Make a pass over the training data.
        t = tqdm(trainLoader, desc = 'Training epoch %d' % epoch)
        network.train()  # This is important to call before training!
        for (i, (inputs, labels)) in enumerate(t):

            # Wrap inputs, and targets into torch.autograd.Variable types.
            inputs = Variable(inputs)
            labels = Variable(labels)
            
            if use_gpu:
                inputs = inputs.cuda()
                labels = labels.cuda()

            # Forward pass:
            outputs = network(inputs)
            loss = criterion(outputs, labels)

            # Backward pass:
            optimizer.zero_grad()
            # Loss is a variable, and calling backward on a Variable will
            # compute all the gradients that lead to that Variable taking on its
            # current value.
            loss.backward() 

            # Weight and bias updates.
            optimizer.step()

            # logging information.
            cum_loss += loss.data[0]
            max_scores, max_labels = outputs.data.max(1)
            correct += (max_labels == labels.data).sum()
            counter += inputs.size(0)
            t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)

        # Make a pass over the validation data.
        correct = 0.0
        cum_loss = 0.0
        counter = 0
        t = tqdm(valLoader, desc = 'Validation epoch %d' % epoch)
        network.eval()  # This is important to call before evaluating!
        for (i, (inputs, labels)) in enumerate(t):

            # Wrap inputs, and targets into torch.autograd.Variable types.
            inputs = Variable(inputs)
            labels = Variable(labels)
            
            if use_gpu:
                inputs = inputs.cuda()
                labels = labels.cuda()

            # Forward pass:
            outputs = network(inputs)
            loss = criterion(outputs, labels)

            # logging information.
            cum_loss += loss.data[0]
            max_scores, max_labels = outputs.data.max(1)
            correct += (max_labels == labels.data).sum()
            counter += inputs.size(0)
            t.set_postfix(loss = cum_loss / (1 + i), accuracy = 100 * correct / counter)
            
# Train the previously defined model.
train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 5, use_gpu = True)

Lab Questions (10pts)¶

1) [2pts] In section 3 of this lab we implemented the ReLU activation function, and used it to train a two-layer neural network. Here please implement Sigmoid, and TanH:

$$\text{Sigmoid(x)} = \frac{1}{1 + e^{-x}} = \frac{e^x}{e^x + 1}$$$$\text{Tanh(x)} = \frac{e^{x} - e^{-x}}{e^{x} + e^{-x}}$$

# Sigmoid of x.
class nn_Sigmoid:
    def forward(self, x):
        # Forward pass.
        pass
    
    def backward(self, x, gradOutput):
        # Backward pass
        pass
        
# Hyperbolic tangent.
class nn_Tanh:
    def forward(self, x):
        # Forward pass.
        pass
    
    def backward(self, x, gradOutput):
        # Backward pass
        pass

2) [1pts] Our ReLU function makes things zero when they are less than zero. This is still the most widely used activation function used today but a variante called LeakyReLU has been proposed where a linear function close to zero is used instead. Here is the definition:

$$ \text{LeakyReLU}(x) = \begin{cases} \beta x & x < 0 \\ x & x \geq 0 \end{cases}$$

where $\beta$ is usally a small value e.g. $\beta = 0.3$

# Sigmoid of x.
class nn_LeakyReLU:
    def __init__(self, beta = 0.3):
        self.beta = beta
    
    def forward(self, x):
        # Forward pass.
        pass
    
    def backward(self, x, gradOutput):
        # Backward pass
        pass

3) [3pts] Propose a new convolutional neural network that obtains at least 66% accuracy in the CIFAR-10 validation set. Show here the code for your network, and a plot showing the training accuracy, validation accuracy, and another one with the training loss, and validation loss (similar plots as in our previous lab). Included is below the LeNet implementation that you can use as a starting point.

from tqdm import tqdm as tqdm
# Try this if the above gives trouble: from tqdm import tqdm_notebook as tqdm

learningRate = 1e-2  # Feel free to change this.

# You can use LeNet as the starting point.
# You can do things such as adding more layers,
# adding more filters to the existing layers, 
# adding things such as BatchNormalization, Dropout, etc.
# anything you want, but add references if you consult something online.
class MyNetwork(nn.Module):
    def __init__(self):
        super(MyNetwork, self).__init__()
        # Convolutional layers.
        self.conv1 = nn.Conv2d(3, 6, 5)
        self.conv2 = nn.Conv2d(6, 16, 5)
        
        # Linear layers.
        self.fc1 = nn.Linear(16*5*5, 120)
        self.fc2 = nn.Linear(120, 84)
        self.fc3 = nn.Linear(84, 10)

    def forward(self, x):
        out = F.relu(self.conv1(x))
        out = F.max_pool2d(out, 2)
        out = F.relu(self.conv2(out))
        out = F.max_pool2d(out, 2)
        
        # This flattens the output of the previous layer into a vector.
        out = out.view(out.size(0), -1) 
        out = F.relu(self.fc1(out))
        out = F.relu(self.fc2(out))
        out = self.fc3(out)
        return out
        
        
# Definition of our network.
network = MyNetwork()

# Feel free to use a different loss here.
criterion = nn.CrossEntropyLoss()

# Feel free to change the optimizer, or the optimizer parameters. e.g. momentum, weightDecay, etc.
optimizer = optim.SGD(network.parameters(), lr = learningRate)

# Train the previously defined model.
train_model(network, criterion, optimizer, trainLoader, valLoader, n_epochs = 20)

4) [4pts] Train a Convolutional Neural Network on the Dogs vs Cats Kaggle competition dataset https://www.kaggle.com/c/dogs-vs-cats. The training data has 25,000 images, I already separated the images into training: 20,000 images and validation: 5,000 images. So please download the training, validation splits from the following dropbox link instead: cats_dogs.zip, or CS link: cat_dogs.zip . You will have to write your own dataset class inheriting from torch.utils.data.Dataset, and a model that trains on this dataset. As usual, include plots.

Optional¶

1) [3pts] For Q4 you get extra points if you use Resnet as in Section 7 but replace the fc layer at the end so that the model only predicts two variables (cat and dog). You will have to then re-train Resnet in this dataset. The idea is to use a model that has already been pre-trained on large task (ILSVRC), and re-train it (often called fine-tuning), on a smaller dataset. Present your code for the model, training output, plots, and example classifications on a few validations set images. Note: If you provide a model that does this in Q4, you directly get awarded 7pts in Q4 but for clarity provide the solution here instead if you plan to do this. Keep in mind that re-training Resnet on 20,000 images will probably still require GPU computing, and some significant computing time so start this early.

2) [2pts] A simpler (less time consuming) approach to using a pre-trained CNN is to use it as a feature extractor. In this strategy we would use the Resnet network to compute "features" of the images, and then train a simple softmax classifier on top of those features. We could for instance remove the "fc" layer from the model, and use the 512-dimensional output of the network as our "features" for each image. Then we train a softmax classifier using these 512-dimensional vectors as inputs. Train such a classifier here for the dogs vs cats task, and present model, plots, and a few example classifications on the validation set. Note: In this task, since we only run the forward pass of Resnet once for each image, we might be able to get away doing this optional part without a GPU.

If you find any errors or omissions in this material please contact me at vicente@virginia.edu