Improve ResNet. ResNeXt and SEResNeXt architectures.

Tags: CV | feature extractor


ResNeXt

Main idea

Main idea in using grouped convolutions(GC) in bottleneck, this leads to more different, less correlated between itself, filters in grouped convolution layer (figure A), besides, GC trains more effectively due to parallel GPU training. Also, with the increased number of kernels in GC, number of parameters of GC decrease.

alex

Figure A. GC in AlexNet k=2, colored and grayscale filters.

GC divide input feature map into chunks and performs the operation of conv. with each of separate chunk by filters / group_num filters. On figure b show a - classic convolution, b - GC.

gc

Figure B.

Bottleneck and architecture

Bottleneck of ResNeXt with GC, all 3 variants is equal and architecture(bottom) of ResNeXt, 32 is cardinality(number of groups), 4 - length of one group.

Thus in bottleneck 3, with input_channels=128, groups=32 and filters=128, input channels divided on 32 groups with length 4, and then each 4-channel group convolved with 128 / 32 = 4 filters.

resnext_b_a

Figure C. Bottleneck(above) and architecture of resnext.

SEResNeXt

Essence, SE block

One and the main future in SEResNeXt is SE block, this block consists of 3 main parts: squeeze, excitation, scale. The idea of this block in weigh input feature maps by element wise product between coef’s and feature maps in order to increase importance for feature maps of one class(for the most characteristic class) and decrease for another class, in other words SE needed for recalibrating feature maps. My example:

There are 2 classes: parrot and eagle. Let’s agree that for parrot characteristics more bright colors, for eagles more dark colors (like black, white, etc.). Among the feature maps in input of SE block, feature maps responsible for “Red color” and “Green color” have most big values(or descriptors from squeeze part). Thus, SE block think which object - parrot, and SE block increase, via coefficients, “parrot like” feature maps and decrease “eagle like” feature maps.

SE block adds(for seresnext-50 example) new 2.5 mil. params and consists of parts:

  1. Squeeze, need for getting descriptors from feature maps, for this uses global average pooling;
  2. Excitation, required for extract coefficients, r - reduction ratio(regular number of neurons) by default - 16;
  3. Scale, element wise multiplication between coef’s and feature maps.

seblock

Figure D. Bottleneck(above) left - for SEResNeXt, right - inception family models, bottom - another visualisation of block.

Visualisations of activations(coef’s) of different layers of SEResNeXt on ImageNet by classes on image E.

se_activity

Figure E.

SEEResNeXt architecture

Architectures of this model on fig. F.

se_arch

Figure F. Architecture of SEResNeXt.

Links


ResNeXt

  1. Nice base explanation;
  2. About grouped convolution;
  3. PyTorch implementation;
  4. Another one implementation;
  5. Original paper.

SERexNeXt

  1. Introduction;
  2. SE block explanation;
  3. Original paper;
  4. Nice pytroch implementation.