Masked Face Analysis via Multi-Task Deep Learning

Proposed Framework

The two distinct methods, single and multi-task learning, represent contrasting approaches in machine learning. In the single-task learning paradigm, a model is designed and trained to perform a specific task, optimizing its performance for that particular task alone. This approach is straightforward and often yields good results when dealing with isolated tasks.

On the other hand, multi-task learning involves training a single model to perform multiple tasks simultaneously. This approach capitalizes on potential shared information across tasks to enhance the model's overall performance. By jointly learning from related tasks, the model can potentially develop a more comprehensive understanding of the underlying data distribution, leading to improved generalization and better performance on all tasks involved.

01

Single Models
We tested our newly created dataset using the more complex ResNet152 network, which has a higher number of convolutional layers and maximum pooling, totaling 60,430,849 parameters. This was done to compare its accuracy against the traditional CNN model with only 7,654 parameters. The ResNet-152 utilized pre-trained weights from ImageNet for training, excluding the top fully connected layers and fine-tuning with 137 layers out of 152. For a single model approach, we explored employing Local Binary Pattern (LBP) and Eigenfaces with an SVM classifier . LBP, a basic texture operator, assigns binary values to pixels based on thresholded neighborhood pixel values. After preprocessing, the dataset was transformed into decimal numbers and used for SVM classification with a linear kernel. This separation was applied for age, gender, and expression models. The results favored deep learning over the LBP-SVM approach. Additionally, we attempted using Eigenfaces with PCA on SVM models, but this yielded worse outcomes compared to the LBP method.
02

Multi-Task Deep Learning
Multi-task deep learning (MTDL) is an approach where multiple learning machines are collaboratively trained, allowing knowledge gained from one task to aid others. This is achieved by sharing parameters, with two main architectures: hard parameter sharing (used in this study) and soft parameter sharing. In the hard sharing method, the parameter set is divided into shared and task-specific ones. Typically, MTDL models with hard parameter sharing comprise a common encoder that then splits into task-specific components.

Experimental results

we compare our proposed approach for analyzing masked faces with two different implementations: a basic Convolutional Neural Network (CNN) and ResNet-152. We also contrast a single model with a multitask learning model. Various baseline methods, including EigenFace, LBP, TinyImage, and VGG Face, were employed for evaluation using the FGNET-MASK dataset's testing set. Accuracy is our primary performance metric, represented as the sum of true positives, true negatives, false positives, and false negatives for each classification task. For the single model, we developed three models using different backbones and approaches. Deep learning models outperformed SVM-based methods like LBP and SVM. The SVM method, utilizing linear SVM, exhibited mixed results among the different feature extraction techniques. Our simple CNN achieved accuracy rates of 0.68 for age, 0.77 for gender, and 0.60 for expression. The more complex ResNet-152 yielded higher accuracy rates of 0.91, 0.95, and 0.82 for age, gender, and expression respectively. We then introduced a multitask deep learning (MTDL) model, which proved superior to single CNN models. The MTDL approach achieved testing precision of 0.74, 0.83, and 0.70 for age, gender, and expression respectively with simple CNN, while ResNet-152 exhibited even better results of 0.95, 0.98, and 0.9 for age, gender, and expression. This emphasizes the advantage of deeper backbones in multitask deep learning. It's worth noting that our work has potential applications in various contexts like surveillance systems and targeted advertisement.

Conclusion and Future Works

So, we dived into the tricky challenge of recognizing faces with masks on – you know, when you can't see the whole face? To crack this, we got creative and built FGNET-MASK. It's this cool new set of masked face pics we made using various techniques. Then comes the techie stuff – we introduced this multi-task deep learning thing (MTDL sounds fancy, right?). Basically, it takes a shot at guessing someone's age, expression, and gender, all while they're masked up. And guess what? Our experiments went pretty well, like high-five well! Looking ahead, we're all about adding more variety to our data collection game. Also, we're not stopping – we're eyeing other datasets like RMFRD. Oh, and there's more – we're snooping around other masked face jobs, like figuring out those key facial points and even magically removing masks. Stay tuned! ✌️

Acknowledgement

We also gratefully acknowledge the support of NVIDIA Corporation with the donation of GPU used for this research.

Authors and Affiliations

Vatsa S. Patel, Zhongliang Nie, trung-Nghia Le and Tam V. Nguyen.
Department of Computer Science, University of Dayton, Dayton, OH, 45469, USA.

Vatsa Patel, Ph.D. Vision & ML Engineer

Data Preparation

Proposed Framework

01

02

Experimental results

Conclusion and Future Works

Acknowledgement

Authors and Affiliations

Next project: Crystal Growth Analysis with Dual-Stage Temporal Convolutional Networks