One of the hottest and most important current discussions in artificial intelligence and machine learning is “convolutional neural network”. The term neural network became famous in 2012. Most deep learning models use artificial neural networks, that’s why these models are also called deep neural networks.
The term “deep” refers to the number of hidden layers in the neural network. Neural networks could have 2 or 3 layers, but today deep neural networks can have as many as 150 layers.
Table of Contents
What is a convolutional neural network?
One of the common models of deep neural networks is the convolutional neural network, which is called CNN or ConvNet for short. The term neural network gained a lot of fame in 2012; this year, Alex Krichevsky won the ImageNet Award (the annual Computer Vision Olympiad) by using a neural network.
Cherishevsky was able to reduce the classification error from 26% to 15%. This reduction was very impressive at the time and was considered a great success. Since then, several companies have used deep learning as the core of their products. Facebook uses a neural network to automatically tag images; Google also uses this technology for its image search. Companies such as Amazon, Instagram, and Pinterest also use the convolutional neural network (CNN neural network) to provide suitable suggestions to their users; however, the most common use of the neural network is in image processing.
Why do we need to use a convolutional neural network?
Image classification is a process in which we take several images from the input and measure their class (dog, car, house, etc.) or the percentage probability of belonging to each class in the output. For us humans, this whole process takes place naturally and without a will; from the time we are born until we become an adult human beings, we gradually learn this task naturally and instinctively. We can recognize all the objects around us with almost no mistakes. More precisely, whenever we look at our surroundings, we recognize all the objects and assign a label to each of them. Performing such an operation, that is, recognizing and naming the objects in an environment, is not that easy for a computer!
What are the input and output in a convolutional neural network or CNN?
When a computer receives an image as input, it sees it as an array of numbers. The number of arrays depends on the image size (in pixels). For example, suppose that if we give the computer a color image in JPG format and the size of 480×480 pixels, its replacement array will have 480x480x3 cells (the number 3 refers to RGB). Each house has a number between 0 and 255. This number shows the pixel intensity. Although these numbers seem meaningless to us, they are the only tools we have in classifying images using the convolutional neural network. The main idea is that we give the computer an array of numbers similar to what we described, and the computer determines something like this in the output: this image is 80% likely to be a cat, 15% likely to be a dog, and 5% likely to be a bird.
What is the working method of a convolutional neural network?
So far, we have learned about the problem of input and output in a neural network. Let’s think about how to solve the problem. What we want from the computer is to look at the images and know the unique features of a specific object such as a book and recognize whether there is a book in the image or not. We humans also do this process unconsciously and obviously when recognizing objects. For example, when we see a dog, to recognize it, we first pay attention to its smaller parts such as ears, paws, legs, etc., and while adapting to the patterns in our minds, we understand that we are seeing a dog. To understand and recognize complex images like the image of a dog, a computer first recognizes the simpler features of that image such as edges and bends. In a neural network, there are several layers; in each of these layers, certain features are detected and finally, in the last layer, the image is fully recognized. The process we described was the general process of how a convolutional neural network works; now let’s go into more detail.
What is the structure of a convolutional neural network?
As we mentioned, in a convolutional neural network, the computer takes an image as input; then this image enters a complex network with several convolutional and non-linear layers. In each of these layers, operations are performed and at the end, a class or the percentage of occurrence of several different classes is shown on the output. The hard part of the story is the middle layers and how they work! Next, we will examine the most important layers.
Functional concepts in the first layer in convolutional neural network
Let’s take a look and see what a convolutional neural network (CNN) does. Each of the filters we mentioned in the previous section can be considered a feature identifier. Feature here means things like a straight line, a simple color, or a curve. Suppose the first filter is a filter with dimensions 7x7x3 and a curvature detector. This filter is a numerical matrix , where the elements of this matrix have higher numerical values in places where there is curvature. Now we put this filter on the part of the image we want. After that, we multiply the numbers in the houses one by one and add the results together.
The result obtained is a large number. The large number indicates that there is a curvature in this area like the curvature of the filter.
Deeper layers of convolutional neural network
In a neural network, in addition to the described layer, there are other layers. These layers have different tasks and functions. In general, internal layers are responsible for maintaining dimensions and non-linearities. The last layer in the convolutional neural network is also of particular importance.
The last layer in the convolutional neural network
In the last layer of a convolutional neural network, the output of other layers is received as input. The output of the last layer is an N-dimensional vector. N is the number of available classes. For example, if your network is a network for identifying numbers, the number of classes is ten; because we have ten digits. In the next N vector, each component represents the probability of occurrence of a class. What the last layer of a convolutional neural network does is that it looks at the features of the upper-level layers and compares the degree of correspondence of these features with each class; The higher the match, the higher the probability of occurrence of that class.
How does convolutional neural network work?
So far you have learned a lot about convolutional neural networks; But you probably still have many questions and new questions have formed in your mind. Questions like how filters are made or… During a training process, the computer can assign the appropriate values to the filters. This process is called backpropagation. When we humans were born, we had no understanding of the objects around us. Over time, we saw different objects and the people around us told us the names of those objects and we learned them. Computers also have a similar function; this means that at the beginning of the work, the numbers in the filter matrix are random. Over time and by showing different images to the computer, the numbers in the filter are corrected to reach acceptable performance.