Tuesday, February 23, 2021

Convolution Neural Network (CNN) architectures: overview

Convolutional Neural Network architectures: overview






 In the 1980s, Yann leCun proposed a convolutional neural network, also called ConvNet or CNN, he used it for the first time with the backpropagation algorithm for the training of handwritten recognition of digits problem in 1989.

Convolutional Neural Network is the most commonly used deep learning architecture for image recognition. It is based on a mathematical operation that helps to extract features from images (convolution operation).

There are different ConvNet architectures such as DenseNet, ResNet, AlexNet, VGGNet, LeNet, U-Net, etc.

Deep Learning Models
Deep Learning Models

LeNet

LeNet is proposed by Yann leCun in 1998, it consists of convolutional layers, pooling layers, and fully connected layers. Before transmitting data to a fully connected layer, a flattening process is applied.

Advantages
  • Easy to understand and use,
  • It works well with the problem of character recognition images.
Disadvantages
  • Low performance because it is not very deep,
  • It is especially proposed to deal with the problem of optical character recognition (OCR), therefore low performance with color images.

AlexNet

It is created by Alex Krizhevsky in 2012 to solve the problem of image classification. It’s deeper than LeNet, it has 60 million parameters. AlexNet8 (with eight layers) consists of five convolutional layers and three fully connected layers (two fully connected hidden layers and one fully connected output layer). After the convolutional layer, Max-pooling with a stride equal to 2 is applied. The output values are flattened for transmission to fully connected layers. AlexeNet used the ReLU activation function.

Advantages
  • It contains several hidden layers which makes it perform better than LeNet for processing color images,
  • Allows the use of multiple GPUs,
  • Helps to avoid overfitting problem by using the overlapping pooling,
  • The dropout layer (parameter = 0.5) that alleviates the overfitting problem is used.
Disadvantages
  • It is not able to solve the problem of the vanishing gradient, because the initialization of the weights is based on the normal distribution,
  • Compared to VGGnet and DensNet, AlexNet is shallower,
  • It uses 5 * 5 convolutional filters while 3 * 3 filters are more efficient.

VGG

VGG is proposed in 2014 by the Visual Geometry Group (VGG), it consists of convolutional layers, pooling layers, and fully connected layers. It is built in the form of blocks which becomes widely used in networks proposed after VGG like ResNet and DenseNet. This architecture is the first which exploited the ILSVR2014 in the classification task. There are Different versions of VGG such as VGG11, VGG16 and VGG19.

Advantages
  • It’s deeper and faster than AlexNet,
  • It uses smaller size kernels (3X3 Kernels),
  • Small size kernels and a large number of layers allow it to learn more complex features.
Disadvantages
  • Take a lot of time in the training stage from scratch,
  • This model is not able to solve the vanishing gradient problem.

ResNet

The addition of hidden layers increases the efficiency of such a model, the major problem with very deep models is the overfitting problem. ResNet aims to deal with complex tasks and solving overfitting and vanishing gradient problems.

Advantages
  • Model of great depth, which allows it to solve complex tasks,
  • Greater precision achieved in various tasks,
  • Faster training time,
  • Ability to overcome the overfitting and vanishing gradient problems.
Disadvantages
  • Take a lot of time in the training stage from scratch,
  • A large number of parameters.

DenseNet

Densely Connected networks (DenseNet) is proposed by Huang in 2017, it consists of convolutional layers, pooling layers, dense blocks, and transition layers. In DenseNet each layer is directly connected with all other layers. It is characterized by feature reuse and diversified features (thanks to the concatenation of features).

Advantages
  • A limited number of parameters,
  • Ability to reduce the problem of vanishing-gradient,
  • Diversified features: thanks to the links between all the layers,
  • Feature reuse,
  • Model of great depth, which allows it to solve complex tasks.
Disadvantages
  • A large number of connections between layers make networks more prone to overfitting problem,
  • DenseNet uses multiple memory spaces as it concatenates all features.

Conclusions

  • Over the years, various architectures have been developed to remedy the problems faced, such as the problem of overfitting, vanishing gradient, etc.,
  • The new models are deeper and more efficient in terms of execution time and treatment of complex tasks,
  • The new models retain the thematic blocks and the links between the layers.