/
...
/
/
πŸŒ“
Chapter 5: Transfert Learning
Search
Duplicate
Try Notion
πŸŒ“πŸŒ“

Chapter 5: Transfert Learning

Overview

Transfert Learning consists in fine-tuning a model that was pre-trained on a huge generic dataset, using a specific dataset of interest.
We leverage knowledge gained from a task to another similar task
High level flow:
Normalize the input images by the same mean and variance use for the pre-train model
Fetch the weight and architecture and load the pre-train model
Truncate some last layers of the model, and froze the remaining weights, as we don’t want to train this model another time
Connect the truncated model to randomly initialized layers, with output size of the last layer matching the number of class to detect
Update the trainable weights over epochs to fit a model

VGG16

VGG stands for Visual Geometry Group, 16 is the number of layers of the model
Use torchsummary to get a clean overview of the architecture
Python
Copy
!pip install torchsummary from torchsummary import summary from torchvision import models model = models.vgg16(pretrained=True) summary(model, size=(3, 224, 224)) # size=(channel, H, W), put any H, W
​
Output
Download cats and dogs dataset from Kaggle
You will need to create a Kaggle token API on your Kaggle account, a kaggle.json file is automatically download
Upload the kaggle.json file into colab when asked by the UI at files.upload()
Python
Copy
from google.colab import files files.upload() !pip install -q kaggle !mkdir -p ~/.kaggle !cp kaggle.json ~/.kaggle/ !ls ~/.kaggle !chmod 600 /root/.kaggle/kaggle.json !kaggle datasets download -d tongpython/cat-and-dog !unzip -q cat-and-dog.zip
​
Create our Dataset class
Python
Copy
class CatsDogsDataset(Dataset): def __init__(self, folder): cats = glob(f"{folder}/cats/*.jpg") dogs = glob(f"{folder}/dogs/*.jpg") self.fpaths = cats[:500] + dogs[:500] shuffle(self.fpaths) self.targets = [ fpath.split("/")[-1].startswith("dog") for fpath in self.fpaths ] self.normalize = transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], ) def __len__(self): return len(self.targets) def __getitem__(self, idx): f = self.fpaths[idx] target = self.targets[idx] img = cv2.imread(f)[:, :, ::-1] # BGR -> RGB img = cv2.resize(img, (224, 224)) img = torch.tensor(img/255) img = img.permute(2, 0, 1) # (H, W, C) -> (C, H, W) img = self.normalize(img) return img.float().to(device), torch.tensor([target]).float().to(device)
​
targets (y) are binary: 1 for dog, 0 for cat
normalize is a standard operation that always take the same values in PyTorch
images need to be
converted from BGR to RGB
resized to the pre-trained network input
scaled between 0 and 1
normalized like the pre-trained network
mean and std normalization use always the same values (see PyTorch Source Code)
we return both the image and its target
Create our model function
Python
Copy
def get_model(): model = models.vgg16(pretrained=True) for param in model.parameters(): param.requires_grad = False model.avgpool = nn.AdaptativeAvgPool2d(output_size=(1, 1)) model.classifier = nn.Sequential( nn.Flatten(), nn.Linear(512, 128), nn.ReLU(), nn.Dropout2d(.2), nn.Linear(128, 1), nn.Sigmoid(), ) loss_fn = nn.BCELoss() optimizer = optim.Adam(model.parameters(), lr=1e-3) return model.to(device), loss_fn, optimizer
​
Freeze all parameters during update, and overwrite avg pool and final classifier
Adaptative pool is an average pooling layer with a twist: instead of defining a kernel size, we define a feature map size, so that the output has always the same size, hence the network can accept images of any dimensions.
Ex: if our input dimension is 512 * k * k, the kernel size will be k * k
Most of the training script of the chapter 3 remains valid, with a few updates:
Add a threshold to get_accuracy
Python
Copy
@torch.no_grad() def get_accuracy(X, y, model): model.eval() y_hat = model(X) is_correct = (y_hat > .5) == y return is_correct.cpu().numpy().tolist()
​
We are able to get 98% accuracy
Looking at VGG11 and VGG19, we observe respectively slightly worse and slightly better performances
However, we can’t just adding layers and make the network deeper, because
Vanishing gradient will arise
More parameters to update
Too much information modification at deep layers
Resnet comes to rescue and addresses when to learn

Resnet

Upon building deep networks, two problem arises:
Last layers close to output have no clue what the original image was
Gradients of first layers is near to zero
Using residual block, we can propagate the original input, so that the network can focus on extracting features, and not seeking to rebuild the input
Implementation
Python
Copy
class ResLayer(nn.Module): def __init__(self, n_i, n_o, kernel_size, stride=1): super().__init__() padding = kernel_size - 2 self.conv = nn.Sequential( nn.Conv2d(n_i, n_o, kernel_size, stride, padding=padding), nn.ReLU(), ) def forward(self, x): return self.conv(x) + x
​
Architecture of ResNet18
18 blocks total, with skip connections every 2 blocks
97% accuracy with only 1000 images
Other popular pre-trained models are Inception, MobileNet, DenseNet, and SqueezeNet

Multi-regression: key facial point detection

Challenges:
Image size can vary, so we need to scale our keypoints as well
After normalization, keypoint coordinates are always between 0 and 1, so we can use sigmoid at the end of the network
Download keypoint data
Python
Copy
!git clone https://github.com/udacity/P1_Facial_Keypoints.git !cd P1_Facial_Keypoints train_dir = 'P1_Facial_Keypoints/data/training/' train_df = pd.read_csv("P1_Facial_Keypoints/data/training_frames_keypoints.csv") train_df.head()
​
column β€œ0” is keypoint_1_x, column β€œ1” is keypoint_1_y
Dataset class
Python
Copy
class KeypointDataset(Dataset): def __init__(self, df, img_dir): super().__init__() self.img_dir = img_dir self.normalize = transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], ) self.df = df def __len__(self): return len(self.df) def __getitem__(self, idx): row = deepcopy(self.df.iloc[idx]) img_path = os.path.join(self.img_dir, row[0]) img = cv2.imread(img_path) / 255 img = self.preprocess_img(img) kp_xy = row[1:].tolist() kp_x = np.array(kp_xy[0::2] / img.shape[0]).tolist() kp_y = np.array(kp_xy[1::2] / img.shape[1]).tolist() kp = torch.tensor(kp_x + kp_y) return img, kp def preprocess_img(self, img): img = cv2.resize(img, (224, 224)) img = torch.tensor(img).permute(2, 0, 1) img = self.normalize(img).float() return img.to(device) def load_img(self, idx): """for debug and viz purposes only""" img_file = df.iloc[idx, 0] img_path = os.path.join(self.img_dir, img_file) img = cv2.imread(img_path) / 255 img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) img = cv2.resize(img, (224, 224)) return img
​
df is the dataset of image path and keypoints
All inputs need to be set as tensors
Normalize image by 255 and normalize again using the standard mean and std of pretrained models
Create loaders
Python
Copy
def get_data(df, img_dir): train_df, test_df = train_test_split(df, test_size=0.2) train_dataset = KeypointDataset(train_df.reset_index(drop=True), img_dir) test_dataset = KeypointDataset(test_df.reset_index(drop=True), img_dir) train_dl = DataLoader(train_dataset, batch_size=32) test_dl = DataLoader(test_dataset, batch_size=32) return train_dl, test_dl
​
split train and test on the training data, so that we use the validation dataset later on
Model need a few twicks as well
Python
Copy
def get_model(): model = models.vgg16(pretrained=True) for param in model.parameters(): param.require_grad = False model.avgpool = nn.Sequential( nn.Conv2d(512, 512, 3), nn.MaxPool2d(2), nn.Flatten(), ) model.classifier = nn.Sequential( nn.Linear(2048, 512), nn.ReLU(), nn.Dropout(0.5), nn.Linear(512, 136), nn.Sigmoid() ) loss_fn = nn.l1_loss optimizer = torch.optim.Adam(model.parameters(), lr=1e-3) return model.to(device), loss_fn, optimizer
​
AvgPool has the same input_channel as output_channel, with a kernel size of 3
β‡’ Add Stanford rule for dimension computing for CNNs
The training procedure stays the same
Inference
Shell
Copy
ix = 0 im = test_dataset.load_img(ix) x, _ = test_dataset[ix] kp = model(x[None]).flatten().detach().cpu() plt.figure(figsize=(10,10)) plt.subplot(221) plt.title('Original image') plt.imshow(im) plt.grid(False) plt.subplot(222) plt.title('Image with facial keypoints') plt.imshow(im) plt.scatter(kp[:68]*224, kp[68:]*224, c='r') plt.grid(False) plt.show()
​
x[None] add one dimension to the image, simulating a batch of a single element
keypoints need to be rescaled to the image width and height
detach remove the vector from the gradient graph

Multi task learning: age estimation + gender classification

How to predict 2 different attribute for the same image, at the same time?
Our new plan is to
Use a pre-trained model, freeze all its layers except the last
Create a divergence on the last layer, and use a continuous loss for age and a binary cross entropy loss for gender
Add the two loss and backpropagate
Dataset
Python
Copy
class AgeGenderDataset(Dataset): def __init__(self, df): super().__init__() self.df = df self.normalize = transforms.Normalize( mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225], ) def __len__(self): return len(self.df) def __getitem__(self, idx): row = self.df.iloc[idx] age = row.age gender = row.gender == "Male" f = row.file img = cv2.imread(f) img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) return img, age, gender def collate_fn(self, batch): list_img, list_age, list_gender = [], [], [] for img, age, gender in batch: img = self.img_preprocess(img) age = float(age/80) gender = float(gender) list_img.append(img) list_age.append(age) list_gender.append(gender) img = torch.cat(list_img).to(device) age = torch.tensor(list_age).to(device).float() gender = torch.tensor(list_gender).to(device).float() return img, age, gender def img_preprocess(self, img): img = cv2.resize(img, (224, 224)) img = torch.tensor(img).permute(2, 0, 1) img = self.normalize(img/255) return img[None]
​
__getitem__ returns feature img and targets age, gender
all preprocessing is done through the collate_fn, called by the DataLoader, with the data processed as batch, instead of individually through __getitem__
img_preprocess needs to permute channels, normalize by 255 and by the pretrained coefficient, and add a dimension to simulate a list: img[None].
Preprocessed images have dimension (1, C, H, W) and their list is then concatenated, so that torch tensor has dimension (N, C, H, W)
DataLoader
Python
Copy
train_ds = AgeGenderDataset(train_df) val_ds = AgeGenderDataset(val_df) train_dl = DataLoader( train_ds, batch_size=32, shuffle=True, collate_fn=train_ds.collate_fn ) val_dl = DataLoader( val_ds, batch_size=32, shuffle=True, collate_fn=val_ds.collate_fn )
​
DataLoader implementes collate_fn, defined as a class method for convenience.
check your implementation with
Python
Copy
a,b,c, = next(iter(train_dl)) print(a.shape, b.shape, c.shape) # torch.Size([32, 3, 224, 224]) torch.Size([32]) torch.Size([32])
​
Model
Python
Copy
def get_model(): model = models.vgg16(pretrained=True) for param in model.parameters(): param.require_grad = False model.avgpool = nn.Sequential( nn.Conv2d(512, 512, 3), nn.MaxPool2d(2), nn.ReLU(), nn.Flatten(), ) model.classifier = AgeGenderClassifier() loss_gender = nn.BCELoss() loss_age = nn.L1Loss() optimizer = Adam(model.parameters(), lr=1e-3) return model.to(device), (loss_age, loss_gender), optimizer
​
Freeze again all parameters by setting require_grad to False
Overwrite avgpool by a convolutional layer, followed by a flatten operator
Overwrite classifier with a custom age gender module
2 losses are defined: one continuous for age: L1Loss, one categorical for gender: BCELoss, and returned as a tuple
AgeGenderClassifier
Python
Copy
class AgeGenderClassifier(nn.Module): def __init__(self): super().__init__() self.intermediate = nn.Sequential( nn.Linear(2048, 512), nn.ReLU(), nn.Dropout(0.2), nn.Linear(512, 128), nn.ReLU(), nn.Dropout(0.2), ) self.age_regressor = nn.Sequential( nn.Linear(128, 1), nn.Sigmoid() ) self.gender_classifier = nn.Sequential( nn.Linear(128, 1), nn.Sigmoid() ) def forward(self, x): x = self.intermediate(x) age = self.age_regressor(x) gender = self.gender_classifier(x) return age, gender
​
Contrary to the previous get_model function, there is no method overwrite here. Methods names are defined freely and called during forward
Final layer diverge between age and gender: forward gets x as input and return both age and gender
Training method
Python
Copy
def train_batch(data, model, loss_fns, optimizer): model.train() optimizer.zero_grad() img, age, gender = data age_pred, gender_pred = model(img) loss_age_fn, loss_gender_fn = loss_fns loss_age = loss_age_fn(age_pred.squeeze(), age) loss_gender = loss_gender_fn(gender_pred.squeeze(), gender) loss_total = loss_age + loss_gender loss_total.backward() optimizer.step() return loss_total.item()
​
Feed both loss function with age and gender
The loss that we back-propagate on is the sum of both losses
Gender accuracy is close to 84% and Age prediction is off by 6 years
Inference
Python
Copy
!wget https://www.dropbox.com/s/6kzr8l68e9kpjkf/5_9.JPG img = cv2.imread('/content/5_9.JPG') img = train_ds.preprocess_image(img).to(device) age, gender = model(img) pred_gender = gender.to("cpu").detach().numpy() age_gender = age.to("cpu").detach().numpy() img = cv2.imread('/content/5_9.JPG') img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) plt.imshow(img) gender = {1: "Male", 0: "Female"}[gender[0][0] > .5] age = int(age[0][0] * 80) print(f"Predicted gender: {gender}, predicted age: {age}")
​
prediction must be sent to cpu, detached (untracked for backpropagation) and turn into numpy array for display purposes
My personal mistakes during implementation
Forgot to add super().__init__ to my custom module
Mistake on the shape of buffer in collate_fn: buffer is a list of element
float() must be called after to(device)
squeeze() after prediction is needed