CNN: Training

Convolutional neural networks, CNN, are widely used in computer vision. Convolution layers are used to extract image features, and pooling layers are used to extract main features.

Generally, the first convolution layer learns some margin features, the second convolution layer then learns some basic shape features. The rest convolution layers further learn more advanced features.

Local invariance: same features can be extracted from images after image augmentations such as sliding, rotating, resizing, etc.

Pooling layers can not only keep the local invariance, but also filter noises, reducing overfitting.

Batch normalization is critical in CNNs training to avoid distribution drift. It standarlizes all samples, so large learning steps can be used in training. It generally speeds up the training speed. Tecanically, it subtracts the batch mean, and is divided by the square root of sample standard deviation.

This article investigated the improvements of several training tips for image classification and segmentation in convolutional neural networks, such as ResNet:

Linear scaling learning rate
Learing rate warmup
Zero gamma in batch normalization
No bias decay
Architecture: stride, number of residual blocks, kernal size, pooling size
Cosine leanring rate decay
Label smoothing
Knowledge distillation
Mixup training

A combination of these tips showed improvements in transfer learning for both object detection and semantic segmentation.

He T, Zhang Zh, Zhang H, Zhang Z, Xie J, Li M (2019): Bag of tricks for image classification with convolutional neural networks. Proceddings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) June 2019

ChuanLi Jiang@yorkiesgo

Machine Learning, Science, Education

CNN: Training