One of the controversial features of the iPhone X is the unlocking method using face recognition (FaceID), which has replaced TouchID.
After creating a bezel-less phone, Apple had to create a new way to unlock the phone in an easy and fast way. While some competitors were satisfied with putting the fingerprint sensor in a new position, Apple decided to create an innovative and revolutionary method in the way of opening the phone; It means looking at the phone screen! Thanks to the use of an advanced and very small front camera, the iPhone X can create a 3D map of the user’s face.
In addition, an image of the user’s face is recorded using an infrared camera, which is more stable to changes in the light and color of the environment. Using deep learning, smartphones can record the user’s face in more detail, so they recognize the user every time the phone is picked up. Surprisingly, Apple claims that this method is even more secure than TouchID, with an outstanding error rate of 1:100,000.
We focused on how to create this process using deep learning and how to optimize each step. In this article, we show how a FaceID-like algorithm can be implemented using Keras. The final tests were done using Kinect, a very popular RGB camera, which has very similar output to the iPhone X’s front-facing cameras.
Table of Contents
FaceID concept
FaceID neural networks have a complex function. The first step is a detailed analysis of how FaceID works on the iPhone X. In TouchID, the user has to first register their fingerprint by repeatedly touching the sensor. After about 15 different samples, the smartphone will complete the registration process and TouchID will be ready to work. Similarly, in FaceID, the user must register his face. The process is very simple: the user just looks at the phone and then slowly turns his head in a circle.
Therefore, the face is registered in different modes. Now the lock screen is ready to work. This surprisingly fast registration method can tell us a lot about the underlying learning algorithms. For example, the neural networks in FaceID don’t just do the classification. Classifying a neural network means learning whether the person looking at the iPhone is the actual user.
Therefore, it should use some training data to predict “correct” or “incorrect”. But despite the many cases of using deep learning, this approach is not effective here. First, the neural network must be trained using the new data obtained from the user’s face. It requires time, energy, and training data from different faces to recognize the image. In addition, this method does not allow for offline Apple training. But FaceID is designed with Siamese Convolutional Neural Network (explained in the next section), which is trained offline by Apple to register faces.
Faces and numbers in neural networks (Siamese neural network)
A Siamese neural network consists of two identical neural networks that also share all the weights. This architecture can learn to calculate the distance between certain types of data such as images. In this way, the data is transmitted through the Siamese network and the neural network maps them in an n-dimensional space. The network is then trained to continue this mapping until the different data points in the different classifications are as close to each other as possible.
In the long run, the network learns to extract the most important features from the data and compress them into an array to create a meaningful map. To understand this correctly, imagine how you would describe the different breeds of dogs using a small diagram. In this way, similar dogs have closer diagrams. You probably use one number to code the dog’s color, another number to specify the dog’s size, another number to determine the shape of the ears, etc.
In this way, dogs that resemble each other will have similar charts. A Siamese neural network can learn to do this for you, similar to what an autoencoder does.
Notice how the neural network architecture learns the similarity between figures and automatically classifies them in two dimensions. A similar technique is applied to faces.
Using this technique, a large number of faces can be used for training to identify which faces are most similar. With enough budget and computing power (like Apple), you can use more difficult examples to make the neural network react to things like twins, hostile attacks (masks), etc.
The final advantage of using deep learning in image recognition
That the network can recognize different users without any further training and calculate whether the user’s face is in the latent face map after taking a few photos during the initial setup. In addition, FaceID can adapt to changes in your appearance: both sudden changes (e.g., glasses, hats, makeup) and minor changes (e.g., facial hair). This is done by adding reference face vectors to this map, which are calculated based on your new appearance.
Implementing FaceID in Keras
In the case of all deep learning projects, the first thing we need is data. Creating our datasets requires the time and cooperation of many people, and this can be very challenging. For this reason, we took help from the RGB-D face dataset available on the Internet. In this dataset, there are people of different shapes and orientations. As it happens when using an iPhone.
At first, we created a convolutional neural network based on the SqueezeNet architecture. The neural network is trained to minimize the distance between images of the same person and maximize the distance between images of different people. After training, the network can draw faces in 128-dimensional arrays.
In such a way that the images of one person are classified together and are far from the images of other people. This means that for unlocking, the network only needs to calculate the distance between the images stored in the face registration phase and the image it receives when unlocking. If the distance is below a certain threshold (the lower the more secure) the device will be unlocked.
Systems based on deep learning make decisions and execute specific commands by imitating human thought patterns and through neural network algorithms.
The neural layers of deep learning systems are not designed and built by engineers; Rather, it is these different data and information that lead to the progress and improvement of the learning process of these algorithms.
FaceID simulation test
Now we will check how this model works. By simulating a typical FaceID cycle. First, register the user’s face. Then the unlocking step, either by the user (which should be successful) or by other people who should not be able to unlock the device.
We start with face registration: we take a series of photos of a person from the dataset and simulate the face registration step.
On the other hand, RGBD images from different people produce an average distance of 1.1.
Therefore, a threshold of around 0.4 should be enough to prevent others from unlocking the device.