The goal of this challenge was to recognize traffic lights in images taken by drivers using the Nexar application. In each given image, it was necessary for the “classifier” to recognize the traffic light and determine whether it is red or green. In particular, only traffic lights should be detected in the direction of driving.
This challenge is based on convolutional neural networks, a very common method used in image recognition with deep neural networks. Smaller models scored higher. In addition, the minimum accuracy needed to win was 95%.
Nexar provided 18,659 labeled images as training data. Each image is labeled with one of three classes: no traffic light, red light, and green light.
Table of Contents
software and hardware
I used Caffe to train the models. The main reason Caffe was chosen was because of its large variety of pre-trained models. Python, NumPy, and Jupyter Notebook were used to analyze the results. Amazon GPU samples were used to train the models.
The final classification in the Nexar test set with a model size of 7.84 MB achieved 94.95% accuracy. The process of achieving higher accuracy involves a lot of trial and error. Some of them had logic behind them, and others were just guesswork.
We started by trying to fine-tune the model that was pre-trained on ImageNet with GoogLeNet architecture. We achieved an accuracy of over 90%!
Recently, most of the published networks are very deep and have many parameters. SqueezeNet seemed to be a good fit and also had a pre-trained model on ImageNet.
The neural network can compress itself by using one-by-one and sometimes three-by-three convolutional filters, as well as by reducing the number of input channels to three-by-three filters.
After some trial and error in adjusting the learning rate, we were able to adjust the pre-trained model with 92% accuracy.
During training, SqueezeNet first performed random cropping of input images by default, and we did not change it. This type of data augmentation makes the network generalize better. Similarly, when generating the prediction, we made several cuts on the input image and averaged the results. We used 5 slices. 4 cuts from the corner and 1 cut from the center (using Caffe code).
Rotating and cropping images showed very little improvement. From 92% to 92.46%.
Additional training with a low learning rate
All models started to overfit after a certain point. This can be achieved by observing the rise of the “credit setting” at some points. At this point, we stop the training because the model probably won’t generalize anymore. We tried to resume training at the point where the model starts to overfit with a learning rate 10 times lower than the original level. This usually improves the accuracy by 0.5%.
Additional educational data
Initially, I divided my data into three sets: training (64%), validation (16%) and testing (20%). After a few days, we decided that giving up 36% of the data might be too much. As a result, the training and validation sets were merged and the test set was used to check the results.
Fix mistakes in training data
When analyzing the errors of the classifier in validation, we noticed gross errors. For example, the model confidently said the light was green while the training data said the light was red. We decided to fix these errors in the tutorial. The argument was that these errors would confuse the model and make it harder to generalize. Even if the final test set has errors in its response, a more general model has a better chance of achieving high accuracy across images. In one of the models with errors, we labeled 709 images. This work with Python script took about an hour and reduced the number of errors to 337.
Defects of deep learning model
Data were not balanced. 19% of the images were without traffic lights, 53% were at red lights and 28% were at green lights. We tried to balance the dataset by oversampling less common data, But no progress was made.
Separation of day and night
We found that traffic light detection is very different during the day and at night. We thought that maybe we could help the model by separating day and night images. By considering the average intensity of the pixels, it was very simple to separate day and night images. We tried two approaches, neither of which worked:
Teaching two separate models for day images and night images
Train the network to predict 6 classes instead of 3, by predicting whether it is day or night
Classifier training for hard cases
We selected 30% of the images for which the classifier had less than 97% confidence. Then we tried to train the classifier only on these images. But there was no improvement.