The field of computer vision is changing from statistical methods to deep learning neural networks. There are still challenging problems in computer vision. However, deep learning methods are achieving modern results to solve some specific problems. It is not only the performance of deep learning models on problems that is interesting; Rather, the fact that a single model can learn the meaning of images, perform vision tasks, and eliminate the need for specialized and manual methods also creates considerable appeal.
In this article, we examine 8 interesting applications of deep learning in computer vision. We tried to focus on end-user problems.
Table of Contents
1. Image Classification
Image classification involves assigning a label to the entire image or photo. Some examples of image classification include:
- Labeling X-ray images whether they are cancerous or not
- Classification of handwritten numbers
- Assigning a name to a photo or a face
A popular example of image classification used as a benchmark is the MNIST dataset. There are examples of image classification that include photos of objects. Two popular examples include the CIFAR-10 and CIFAR-100 datasets, which have images classified into 10 and 100 classes, respectively.
The Image Recognition Challenge is an annual competition in which teams compete for the best performance on a range of computer vision tasks on data drawn from the ImageNet database. Many important advances in image classification come from published papers.
2. Image Classification with Localization
Image classification with segmentation involves assigning a label to an image and showing the location of the object in the image by a box (drawing a box around the object). This is a more challenging version of image classification. Some examples of image classification with segmentation:
- Labeling X-ray images whether they are cancerous or not and drawing a box around the cancerous area
- Categorize photos of animals and draw a box around the desired animal in each photo
A classic dataset for image classification with segmentation is the PASCAL visual classes dataset.
This may include adding a box around multiple instances of the same object in the same image. We know this action as “object detection”. The ILSVRC2016 dataset for image classification by segmentation is a popular dataset consisting of 150,000 images and 1000 object categories.
3. Object Detection
Object detection is image classification by segmentation, although an image may contain multiple objects that require segmentation and classification. This is a more challenging task than simple image classification or image classification with segmentation because there are often several images of different types. Techniques for image classification with segmentation are often used to identify objects. The PASCAL visual dataset is a common dataset for object recognition.
4. Style Transfer
Style transfer or neural style transfer is the task of taking a style from one or more images and applying that style to a new image. This task can be thought of as a kind of photo filter or transformation that may not have an objective evaluation. For example, we can refer to using the style of specific and famous works of art (such as the works of Pablo Picasso or Vincent van Gogh) for new photos. The datasets usually include famous works of art that are publicly available as well as standard computer vision datasets.
5. Image Colorization
Image coloring or neural coloring involves converting a grayscale image into a full-color image. This task can be thought of as a kind of photo filter or transformation that may not have an objective evaluation. Examples include the colorization of old black-and-white photographs and films. Datasets typically involve using existing photo datasets and creating grayscale versions of the photos that the models must learn to colorize.
6. Image Reconstruction
Image reconstruction is the act of filling in the missing parts of an image. This task can be thought of as a kind of photo filter or transformation that may not have an objective evaluation. Examples include restoring damaged old black-and-white photographs and films. Datasets usually involve using existing photo datasets and creating damaged versions of photos that the models must learn to reconstruct.
7. Super-Resolution Image
Super-resolution image is responsible for producing a new version of an image with higher resolution and detail than the original image. The data usually consists of existing photo datasets and the creation of low-quality versions of the photos, which the models must be trained to create super-resolution versions.
8. Image Synthesis
Image synthesis is responsible for making targeted changes in existing images or completely new images. This is a very broad field that is developing rapidly. This may include minor changes to the image and video, such as:
- Change the style of an object in a scene
- Add an object to the scene
- Add a face to the scene
It may also involve creating entirely new images, such as:
- Create faces
- Create Room
- Create clothes
There are other important and interesting things that we didn’t cover because they are not just related to computer vision.
Examples of image-to-text and text-to-image conversion:
Image Captioning: Generating a textual description of an image.
Image Description: Generating a textual description of each object in an image.
Text-to-image: combine images based on the text description.