Improve ResNet. ResNeXt and SEResNeXt architectures.
Tags:
CV
|
feature extractor
ResNeXt
Main idea
Main idea in using grouped convolutions(GC) in bottleneck, this leads to more different, less correlated between itself, filters in grouped convolution layer (figure A), besides, GC trains more effectively due to parallel GPU training. Also, with the increased number of kernels in GC, number of parameters of GC decrease.
Figure A. GC in AlexNet k=2, colored and grayscale filters.
GC divide input feature map into chunks and performs the operation of conv. with each of separate chunk by filters / group_num filters. On figure b show a - classic convolution, b - GC.
Figure B.
Bottleneck and architecture
Bottleneck of ResNeXt with GC, all 3 variants is equal and architecture(bottom) of ResNeXt, 32 is cardinality(number of groups), 4 - length of one group.
Thus in bottleneck 3, with input_channels=128, groups=32 and filters=128, input channels divided on 32 groups with length 4, and then each 4-channel group convolved with 128 / 32 = 4 filters.
Figure C. Bottleneck(above) and architecture of resnext.
SEResNeXt
Essence, SE block
One and the main future in SEResNeXt is SE block, this block consists of 3 main parts: squeeze, excitation, scale. The idea of this block in weigh input feature maps by element wise product between coef’s and feature maps in order to increase importance for feature maps of one class(for the most characteristic class) and decrease for another class, in other words SE needed for recalibrating feature maps. My example:
There are 2 classes: parrot and eagle. Let’s agree that for parrot characteristics more bright colors, for eagles more dark colors (like black, white, etc.). Among the feature maps in input of SE block, feature maps responsible for “Red color” and “Green color” have most big values(or descriptors from squeeze part). Thus, SE block think which object - parrot, and SE block increase, via coefficients, “parrot like” feature maps and decrease “eagle like” feature maps.
SE block adds(for seresnext-50 example) new 2.5 mil. params and consists of parts:
- Squeeze, need for getting descriptors from feature maps, for this uses global average pooling;
- Excitation, required for extract coefficients, r - reduction ratio(regular number of neurons) by default - 16;
- Scale, element wise multiplication between coef’s and feature maps.
Figure D. Bottleneck(above) left - for SEResNeXt, right - inception family models, bottom - another visualisation of block.
Visualisations of activations(coef’s) of different layers of SEResNeXt on ImageNet by classes on image E.
Figure E.
SEEResNeXt architecture
Architectures of this model on fig. F.
Figure F. Architecture of SEResNeXt.
Links
ResNeXt
- Nice base explanation;
- About grouped convolution;
- PyTorch implementation;
- Another one implementation;
- Original paper.