Score a Face with Deep Learning

14 min readFeb 3, 2021

http://charliethewanderer1.ddns.net/faceeva/

This project is available to anyone. You can try it on the website linked above.

Don’t take it seriously! The model may produce an unreasonable result since it has been trained with only 500 face images of Koreans who are in their 20s and 30s.

I started this project last September, and it took about 3weeks to finish. To practice development and deployment with what I learned from Coursera, I started the project with the topic related to image processing.

In the beginning, I got a couple of ideas, including diagnosing hair loss with images of the forehead and the top of the head. However, while I was looking to gather the dataset for it, I found it was really tricky and hard to even find the data. For example, If I were to build the model above, I had to get the images of each part of someone’s head that has privacy concerns and also had to be able to get the labels showing whether it’s hair loss or not. As I’m not a doctor and didn’t have professional knowledge about hair loss, I couldn’t tell the target labels out of images. Likewise, in most cases, it was almost impossible to collect the related data for a certain topic, so I decided to go with a topic: Scoring Faces with Deep Learning, which was much easier to deal with in terms of gathering data and generating target labels.

Model Development

The way I collected data was just doing manual labor. I downloaded every face image from Google and gave it a score of 1 if it was the worst-looking face and gave 5 if it was the best-looking face.

I knew some people might think the topic itself could be morally inappropriate given that it judges people by only their appearance. However, my main purpose for this was not making the model judge people but just implementing a deep learning algorithm into something people can easily observe.

As I assumed selfies from smartphones will be the input images for this model to predict a corresponding label, I tried to collect high-definition images as much as possible for training. Also, I built a balanced dataset by collecting 50 images for each label of both males and females. However, even in this process, there were a lot of things to consider.

First of all, a human is not evaluating someone’s face with only his or her face. When we evaluate, we unconsciously take other things into account like clothes someone’s wearing, hairstyle, height, and even reputation if he or she is a celebrity.
Let’s say there are two people looking identical, but one is a celebrity and the other is a man in the street. With whom would you likely feel an affinity more? We can see that even though those two people look exactly the same in terms of their appearance, we end up feeling different things between them.

Secondly, there was no way to prevent grader’s perspective to infiltrate into the model during the training phase, since the model has to be trained with a dataset labeled by a grader.

Therefore, I had to be aware of them and tried to be objective as much as possible to minimize the effect of external factors.

Also, I had to preprocess the images for training. Since they had not only people’s faces but also other things like clothes and background, I used Python’s OpenCV library to implement face recognition to crop out a face out of an image with a code below.

def get_face_location(image_path):
    
    #Load the cascade
    face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')

    #Read the input image
    image_path = image_path
    img = cv2.imread(image_path)

    #Convert into grayscale
    gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    #Detect faces
    faces = face_cascade.detectMultiScale(gray_img, 1.03, 6)
    
    #Check whether there are more than one face or not
    detecting_result = 2 # 2 is normal
    if faces == ():
        detecting_result = 0 # "There's no face detected"
    elif len(faces) > 1:
        detecting_result = 1 # "There's more than one face detected"

    #Suppress other boxes
    max_box_coordinates = (0, 0, 0, 0)
    for (x, y, w, h) in faces:
        if w * h >= max_box_coordinates[2]*max_box_coordinates[3]:
            max_box_coordinates = (x, y, w, h)

    return (max_box_coordinates, img, detecting_result)

I used the Haarcascade algorithm from OpenCV. Other than the Haarcascade, there are many different kinds of algorithms such as MTCNN, SSD, which make use of the convolutional neural network.
For further information for the Haarcascade, you can check the documentation with the link below.

OpenCV: Cascade Classifier

Next Tutorial: Cascade Classifier Training In this tutorial, We will learn how the Haar cascade object detection works…

docs.opencv.org

Each algorithm has its pros and cons. The Haarcascade algorithm is relatively traditional and doesn’t make use of recent deep learning technologies, so when it runs into anomalous images such as a masked or tilted face, its accuracy gets lower than more recent algorithms like MTCNN and SSD. However, since it doesn’t require high computational power compared to those recent ones, it can handle far more images in a limited time and can be implemented even under a micro-computer such as a RaspberriPi or a smartphone.

In my project, It was expected that a user will take a selfie and send it to the model, so I could assume that the user will understand he or she has to take the front face to input. Also, as I was going to use a Raspberry Pi as a server, computing power was limited. Therefore, I put the efficiency before the performance of generalization for anomalies to deal with many users’ requests as fast as possible.

After building the crop process, I started structuring the model.
While thinking about training with prepared datasets and labels, there occurred one problem related to gender. I thought ‘Should I train the model without distinguishing gender or not?’. Either approach was hard to tell it’s wrong or correct. After some careful thought, I decided to mimic the way humans do. When we evaluate someone’s face, we first figure out whether the person is he or she, and then start to think about his or her appearance. In other words, since the result between seeing someone thinking the person is a woman and thinking the person is a man would be totally different, we first distinguish gender. Also, I thought it would be more interesting for users if the model put out the predicted gender not only a score of a face, so I built a model distinguishing gender and spliced it to the scoring model like the picture below.

I implemented it with Google Tensorflow. At first, I wrote a code from scratch as below and trained the model, but I couldn’t get the relevant accuracy for both the training set and dev set. Understandably, the reason was I got only 500 training images.

def get_model():
     inputs = tf.keras.layers.Input(shape=(350, 350, 3))
    h = tf.keras.layers.Conv2D(64, (3, 3), activation='relu')(inputs)
    h = tf.keras.layers.MaxPooling2D((3, 3))(h)
    h = tf.keras.layers.BatchNormalization()(h)
    h = tf.keras.layers.Conv2D(128, (3, 3), activation='relu')(h)
    h = tf.keras.layers.MaxPooling2D((3, 3))(h)
    h = tf.keras.layers.Conv2D(128, (3,3), activation='relu')(h)
    h = tf.keras.layers.MaxPooling2D((3,3))(h)
    h = tf.keras.layers.Dropout(0.15)(h)
    h = tf.keras.layers.Flatten()(h)
    h = tf.keras.layers.Dense(128, activation='relu')(h)
    h = tf.keras.layers.Dense(256, activation='relu')(h)
    outputs = tf.keras.layers.Dense(5, activation='softmax')(h)
    
    model = Model(inputs=inputs, outputs=outputs)
    model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])    
    model.summary()
    
    return model

There are a couple of ways to cope with a shortage of data like in my case.
Firstly, just collecting data can be one of them. It sounds a little bit stupid but, in fact, in some cases, it can be the only way or the fastest way to solve the problem.

Secondly, you can synthesize data. It’s easy to implement in Tensorflow thanks to ImageDataGenerator class which helps generate new images from original images by applying brightness change, tilt change, crop, reversing to them. However, it’s not a panacea. You can think of it as just a tool to help the model generalize well to unforeseen data.

Finally, if you can utilize Transfer Learning, then make use of it. Transfer Learning is like using the knowledge that you already have when you solve a problem you don’t know yet. Even though the ultimate goals of deep learning models are different, the tasks that occur on lower layers of each model could be similar.
Let’s say there are two models for image processing. One is my model which scores people’s faces and the other is a model that distinguishes dogs from cats. Then, although the goals are different, tasks like distinguishing the edges of objects and brightness occur on the lower layers. That is they have something in common for basic tasks in some part of their layers regardless of their topics.

Therefore, I can extract the output layer of my model and put it on top of the already trained other model, and then train the spliced model with the data I have. This can be called Transfer Learning. The closer a fetched model’s tasks to your original model, the more effective it’s going to be.

I decided to use MobilnetV2 from Tensorflow Hub.
If you want to know how to use it or what it is, you can visit the link below.

TensorFlow Hub

Edit description

tfhub.dev

A simple description of MobilnetV2 is ‘MobileNet V2 is a family of neural network architectures for efficient on-device image classification and related tasks.’

Therefore, we can see it was designed to be run on micro-computer chips such as RaspberryPi and smartphones with high efficiency. Also, it’s been already trained with the ILSVRC-2012-CLS dataset for image classification, so I thought it would suit well with my tasks.

#fetch mobilenet_v2 model
def get_mobilenet_model():
    mobilenet_v2 = hub.KerasLayer("https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4", output_shape=[1280], trainable=False)
    
    model = tf.keras.Sequential([
        tf.keras.layers.InputLayer(input_shape=(224, 224, 3)),
        mobilenet_v2,
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(5, activation='softmax')
    ])
    
    model.build((None,)+(224,224,3))
    model.summary()
    
    return model

If you use Transfer Learning like above, you don’t have to write wordy codes by yourself, because all you need to do is load the model you want to use and put the output layer for your task on top of the model.

By assigning ‘trainable=False’ to the KerasLayer method in the second line, I removed the output layer of Mobilenet. Then, since the final goal was to classify the face into five labels, I put a Dense layer with five units on top of Mobilenet.

Likewise, I built a model for distinguishing gender as well.
The only difference from the code above is that it’s a binary classification, so it has a Dense layer with only one unit in it and the activation function is a sigmoid function.

#fetch mobilenet_v2 model for binary classification
def get_mobilenet_model_gender():
    mobilenet_v2 = hub.KerasLayer("https://tfhub.dev/google/tf2-preview/mobilenet_v2/feature_vector/4", output_shape=[1280], trainable=False)
    
    model = tf.keras.Sequential([
        tf.keras.layers.InputLayer(input_shape=(224, 224, 3)),
        mobilenet_v2,
        tf.keras.layers.Dropout(0.2),
        tf.keras.layers.Dense(1, activation='sigmoid')
    ])
    
    model.build((None,)+(224,224,3))
    model.summary()
    
    return model

Now, work for training has almost finished.

directory_man_train = 'dataset/cropped_train_set/man'
directory_man_val = 'dataset/cropped_val_set/man'

datagen = ImageDataGenerator(rotation_range=20,
                            horizontal_flip=True,
                            brightness_range=[0.6, 1.3],
                            shear_range=0.15,
                            rescale=(1./255.))

train_generator_man = datagen.flow_from_directory(directory_man_train,
                                             target_size=(224, 224),
                                             batch_size=1,
                                             class_mode='sparse')

val_datagen = ImageDataGenerator(rescale=(1./255.))

val_generator_man = val_datagen.flow_from_directory(directory_man_val,
                                                   target_size=(224, 224),
                                                   batch_size=1,
                                                   class_mode='sparse')

The code above is for synthesizing images out of 500 image datasets. Since I was short of datasets, this process was necessary to prevent overfitting. ImageDataGenerator of Tensorflow automatically generates new images in memory by applying brightness changes, color changes, tilt changes, crop, etc to original images. You will see what it does through the pictures below.

There are 12 images but it’s actually all the same image.

Since I didn’t have much data, it took less than 10 minutes to finish training with CPU. However, the result was quite amazing considering the number of training datasets. Validation accuracy was higher than 80% and more surprising was the accuracy of distinguishing gender, which was higher than 90%. Also, I tested the model with unforeseen data to see the model’s generalization performance for unforeseen data. The output results from the model were not that different from that of my eyes, so I concluded that the model was well trained considering the number of datasets.

Model Deployment

I was full of happiness thinking all I need to do is just populate it to my RaspberryPi. However, it turned into despair, because it turned out it was just the beginning.

Since RaspberriPi runs on RaspbianOS based on Debian Linux, I had to know how to use Linux. Also, I had to deal with the CLI environment through SSH access with VS CODE as it didn’t have a monitor. What’s even worse was it had an ARM-based CPU, not an X86-based, so it had compatibility issues of Linux packages and Python libraries.

Up until then, I had been using only the GUI interface of Windows, so I had a hard time learning how to work with all those Linux, VS CODE, CLI environment, ARM environment at the same time. As I started to use Conda virtual environment in addition to them, tons of error messages blocked my every step, so I almost panicked.

‘Why on earth are you using Linux?’, ‘Why do you need a virtual environment?’, ‘Where are my files I just downloaded?!’, ‘Why does it say I don’t have permission? It’s my computer!’, these kinds of thoughts which are understated here, filled my head at the time.

When I was exhausted looking for an answer, I slept for a while and then tried it again with the mindset of ‘There’s always an answer. I just haven’t found it yet.’. By searching through websites all around the world, I managed to find an answer one by one, and there was always someone who asked what I was looking for before me. While doing this, I learned how to deal with unpredicted problems that can occur from nowhere, and most of the questions I got were answered naturally as time goes by.

Before I deploy my model on the RaspberriPi server, I had to install a framework that can run a web server and web application and database. I chose Django that I learned from Coursera, and fortunately, every package and library for Django was compatible with ARM Linux. After finishing designing work for CSS, HTML, and work for database, I had to install Tensorflow for the deep learning model. However, a horrible error message came up saying it doesn’t support ARM Linux.

After some trial and error, I found I can use Tensorflow Lite in RaspberriPi although Tensorflow Lite only supports model inference, not training, so I converted my model into Tensorflow Lite code.

#Load gender classifying model
model_gender = tf.keras.models.load_model('./savedmodel/gender_checkpoint')
#Load scoring model for a man 
model_man = tf.keras.models.load_model('./savedmodel/man_checkpoint')
#Load scoring model for a woman
model_woman = tf.keras.models.load_model('./savedmodel/woman_checkpoint')

converter = tf.lite.TFLiteConverter.from_saved_model('./savedmodel/gender_checkpoint')
tflite_model = converter.convert()
open("converted_model_gender.tflite", "wb").write(tflite_model)

converter = tf.lite.TFLiteConverter.from_saved_model('./savedmodel/man_checkpoint')
tflite_model = converter.convert()
open("converted_model_man.tflite", "wb").write(tflite_model)

converter = tf.lite.TFLiteConverter.from_saved_model('./savedmodel/woman_checkpoint')
tflite_model = converter.convert()
open("converted_model_woman.tflite", "wb").write(tflite_model)

It’s quite easy to convert Tensorflow code to Tensorflow Lite code. All you have to do is just load the model you built and use the method that does all the work for converting for you.

Then, you can deploy the model by populating your .tflite files and building a little bit of code for connecting it to Django’s View.py. If you are familiar with Javascript, you can handle some of the work in the front end, but in my case, I made the most work get processed in the back end so that the model does all the work in the back end and just put out the results to the front end.

import numpy as np
import tflite_runtime.interpreter as tflite
import cv2
from PIL import Image
import os

base_dir = os.path.dirname(os.path.abspath(__file__))

#Load TFLite models
interpreter_gender = tflite.Interpreter(os.path.join(base_dir, 'converted_model_gender.tflite'))
interpreter_man = tflite.Interpreter(os.path.join(base_dir, 'converted_model_man.tflite'))
interpreter_woman = tflite.Interpreter(os.path.join(base_dir, 'converted_model_woman.tflite'))

#Feed an interpreter and array data and get output_pred
def get_output_tflite(interpreter, input_data):
    
    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()
    interpreter.allocate_tensors()
    
    interpreter.set_tensor(input_details[0]['index'], input_data)
    interpreter.invoke()
    output_data = interpreter.get_tensor(output_details[0]['index'])
    return output_data

def get_face_location(image_path):
    
    #Load the cascade
    face_cascade = cv2.CascadeClassifier(os.path.join(base_dir,'haarcascade_frontalface_default.xml'))

    #Read the input image with PIL Image
    image_path = image_path
    img = Image.open(image_path)
    img = np.array(img)
    #rgb to bgr
    img = img[...,::-1]

    #Convert into grayscale
    gray_img = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

    #Detect faces
    faces = face_cascade.detectMultiScale(gray_img, 1.03, 6)
    
    #Check whether there are more than one face or not
    detecting_result = 2 # 2 is normal
    if faces == ():
        detecting_result = 0 # "There's no face detected"
    elif len(faces) > 1:
        detecting_result = 1 # "There's more than one face detected"

    #Suppress other boxes
    max_box_coordinates = (0, 0, 0, 0)
    for (x, y, w, h) in faces:
        if w * h >= max_box_coordinates[2]*max_box_coordinates[3]:
            max_box_coordinates = (x, y, w, h)

    return (max_box_coordinates, img, detecting_result)

def cropped_image(image_ndarray, x, y, w, h, detecting_result):
    
    if detecting_result == 0:
        #print("There is no face detected")
        return image_ndarray
        
        
    else:
        cropped_img = image_ndarray[y:y+h, x:x+w]
        return cropped_img

def get_face_rank(pred):
    
    if pred == 0:
        string = "ㅆㅎㅌㅊ 입니다..힘내세요.."
    elif pred == 1:
        string = "ㅎㅌㅊ 입니다.."
    elif pred == 2:
        string = "ㅍㅌㅊ 입니다.."
    elif pred == 3:
        string = "ㅅㅌㅊ 입니다.."
    else:
        string = "ㅆㅅㅌㅊ 입니다. 축하합니다!"
    
    return string

def tflite_model_predict(image_path):
    image_path = image_path   

    face_coordinates, img, detecting_result = get_face_location(image_path)
    x, y, w, h = face_coordinates
    cropped_img_original = cropped_image(img, x, y, w, h, detecting_result)
    cropped_img = cv2.resize(cropped_img_original, (224, 224))
    cropped_img = cv2.cvtColor(cropped_img, cv2.COLOR_BGR2RGB)/255.
    
    pred_sex_num = get_output_tflite(interpreter_gender, np.array(np.expand_dims(cropped_img, axis=0), dtype=np.float32))
    pred_sex = np.round(pred_sex_num)
    # Classify gender first
    if pred_sex == 0:
        sex = 'man'
    else:
        sex = 'woman'
    # Scoring a face based on the gender
    if sex == 'man':
        pred = get_output_tflite(interpreter_man, np.array(np.expand_dims(cropped_img, axis=0), dtype=np.float32))
    else:
        pred = get_output_tflite(interpreter_woman, np.array(np.expand_dims(cropped_img, axis=0), dtype=np.float32))
    
    return pred, np.argmax(pred[0]), cropped_img_original, detecting_result, pred_sex_num, face_coordinates

You can see the operating application in the pictures below.

A score of 5 is the best and 1 is the worst.

Drawing a rectangle on a recognized face was processed in the front end by taking box coordinates over from the back end.

When you click the upload button after selecting the image you want for the model to predict, the model operates in the back end and give you back the predicted face score and gender. It takes about one or two seconds to process. Considering it’s running on a RaspberriPi whose computational power is relatively low, I think the model is quite fast.

Takeaways from the project

The whole project feels not that difficult now, but at the time, I had a hard time dealing with many different kinds of problems and issues that were new to me. After all, I learned a lot of things from this project. One of the biggest takeaways was that I have to be familiar with not only deep learning but also, how to manipulate data itself to implement deep learning. When I was working on assignments in Coursera, every dataset and label was beautifully prepared for me, so all I had to do was focus only on improving the performance of the model by fine-tuning parameters. However, in reality, structuring a pipeline and collecting and labeling datasets for the model was more important than just building a model because a model can’t do anything without relevant datasets.

The second thing I learned was you have to be an all-around player if you want to deploy what you built. If you know only machine learning and deep learning themselves, then it’s like you know nothing about where to buy and how to prepare ingredients but only know recipes when you have to cook. Therefore, I spent a lot of time learning things like how to set a development environment, Linux, the structure of the web, database, SQL, computer network and found that they were important as deep learning.

Finally, I found that access to image data for deep learning is quite limited. That is, as opposed to text data, image data is hard to scrape, and even when it’s possible, it’s hard to store on a large scale since it occupies a much larger space than text data. Therefore, I thought it would be much efficient to implement deep learning on natural language processing with text data.

Score a Face with Deep Learning

OpenCV: Cascade Classifier

Next Tutorial: Cascade Classifier Training In this tutorial, We will learn how the Haar cascade object detection works…

TensorFlow Hub

Edit description

Written by Charlie_the_wanderer