Search
Duplicate
Try Notion
πŸŒ’πŸŒ’

Chapter 4: CNN

The issue with vanilla NN

We use np.roll to shift background pixels from left to right of a Trouser image, so that the object isn’t centered anymore
Beyond 2 pixels, the correct probability drops significantly, so we can’t rely on our previous model for generalization

Building blocks of CNN

Convolution

CNN filter is a matrix of weights initialized randomly at first
Different filters represent different patterns of features to detect in the image, that can be activated
If we convolve a 4x4 grayscale image through 10 different 2x2 filters, the output matrix shape will be 3x3x10, there are as many channels as filters
If we use a color image with 3 channels, like 28x28x3, each filter will also have 3 channels

Padding

Add a external layer of 0 to maintain image size after convolving

Pooling

Aggregates data into a smaller matrix, most usual: max, mean and sum
Here with a stride of size 2
The max pooling output is
Pooling allows to abstract a region, making the model more robust to change (translating a row of pixel to the right might not change the output)

Flattening

Convolution and pooling help obtaining a image representation with a much lower dimension that original
This representation can then be treated as a vanilla NN input, like we did in the previous chapter

Implementation

πŸ”‘
PyTorch expect input matrix to has shape (N, C, H, W), with N the number of images, C the number of channels and H, W the image dimensions
In our Dataset class, the view is
Python
Copy
x = x.view(-1, 1, 28, 28)
​
Our model become:
Python
Copy
nn.Sequential( nn.Conv2d(1, 64, kernel_size=3), nn.MaxPool2d(2), nn.ReLU(), nn.Conv2d(64, 128, kernel_size=3), nn.MaxPool2d(2), nn.ReLU(), nn.Flatten(), nn.Linear(128 * 5 * 5, 256), nn.ReLu(), nn.Linear(256, 10), ).to(device)
​
The image translation problem output has significantly improve
But still room for improvement beyond translation of 4 pixels