Skip to content

Facial Emotion Recognition with FER

Michael Sawan edited this page Feb 5, 2024 · 1 revision

FER+ Dataset Overview

The FER+ dataset stands as a notable extension of the original Facial Expression Recognition (FER) dataset, aimed at overcoming limitations and providing a refined labeling system for facial expressions. In contrast to the six basic emotions of the original FER dataset, FER+ introduces two additional categories: "neutral" and "contempt," recognizing the intricacies of human emotions.

This model is chosen for the implementation of emotion detection because it represents a thoughtful selection, characterized by its inherent sophistication and applicability to the intended purpose. This decision results from a meticulous examination of the model's architectural intricacies, operational efficiency, and alignment with the nuanced requisites of facial emotion recognition.

FER+ vs. FER Datasets

This expansion acknowledges the prevalence of neutral expressions, serving as a baseline for comparison and addressing scenarios where primary emotions are not distinctly conveyed. FER+ emerges as a valuable resource, capturing a broader spectrum of emotional states for training sophisticated emotion recognition models.

Face Detection with RFB-320 SSD Model

Before diving into emotion recognition, the code employs the RFB-320 Single Shot Multibox Detector (SSD) model for efficient face detection. Optimized for edge computing devices, this model integrates a modified Receptive Field Block (RFB) module, capturing multiscale contextual information without adding computational overhead.

RFB-320 SSD Model Architecture

Trained on the WIDER FACE dataset with a tailored 320×240 input resolution, the RFB-320 SSD model achieves an impressive balance of efficiency, boasting 0.2106 GFLOPs computational rate and a compact size of 0.3004 million parameters. The code implements this model in Caffe format, ensuring precise face detection in resource-constrained environments.

Custom VGG13 Model for Emotion Recognition

Following face detection, the code leverages a custom VGG13 architecture for facial emotion recognition. This model, tailored for 64×64 grayscale images, classifies images into eight emotion classes through convolutional layers, max pooling, and dropout to prevent overfitting.

Architecture

Despite the small dataset, the strategic placement of dropout layers enhances the model's generalization capabilities. The architecture encompasses convolutional layers, dense layers, and a softmax output layer predicting the emotion class.

Regarding the implementation of the code

In the code, these advanced models work synergistically for real-time facial emotion recognition in video frames. The RFB-320 SSD model efficiently detects faces, while the custom VGG13 model analyzes facial expressions and predicts emotions. The integration of these models showcases their roles in achieving accurate and detailed emotion recognition.