ADVANCED FACIAL EMOTION RECOGNITION USING CONVOLUTIONAL NEURAL NETWORKS FOR ENHANCED ACCURACY AND PERFORMANCE

Ayesha Binte Shahid; Tanzeel-Ur-Rehman; Muhammad Fuzail; Ahmad Naeem; Naeem Aslam

doi:10.71146/kjmr439

Authors

Ayesha Binte Shahid Department of Computer Science, NFC Institute of Engineering and Technology, Multan, Pakistan. Author
Tanzeel-Ur-Rehman Department of Computer Science, NFC Institute of Engineering and Technology, Multan, Pakistan. Author
Muhammad Fuzail Department of Computer Science, NFC Institute of Engineering and Technology, Multan, Pakistan. Author
Ahmad Naeem Department of Computer Science, NFC Institute of Engineering and Technology, Multan, Pakistan. Author
Naeem Aslam Department of Computer Science, NFC Institute of Engineering and Technology, Multan, Pakistan. Author

DOI:

https://doi.org/10.71146/kjmr439

Keywords:

Facial Emotion Recognition (FER), Convolutional Neural Networks (CNNs), Deep Learning, Residual Networks (ResNet), Dense Networks (DenseNet)

Abstract

One seemingly indispensable sector of computer vision and AI includes Facial Emotion Recognition (FER), which enables machines to judge the human emotions perceived from the expression of faces. Notwithstanding the recent advancements, current FER technologies continue to struggle with issues of diversity of expressions, lighting conditions, object obstructions, lack of labeled data, and consequently, failing to generalize well in real-world situations. This research aims to remove such hurdles by observing the performance of several deep learning architectures, such as Convolutional Neural Networks (CNNs), specifically for accurate emotion classification. A model based on CNN was trained and tested using the FER dataset, delivering outstanding results that included 95% accuracy and precision/recall/f1 score above 94%. The study assessed its performance against the best in industry models like ResNet, DenseNet, and Vision Transformer (ViT). Despite the great results achieved by ResNet and DenseNet, the CNN model had better efficiency and generalization. Although its potential, vision transformers showed higher loss as they are dependent on large sets of data, and they performed poorly in capturing local features of low-resolution facial imagery. The success of the CNN model is caused, to a great extent, by its ability to extract spatial features effectively, have low overloading levels, and fit well to the FER characteristics. Through the benefits, the model specializes in emotion detection in various facial expressions. Notwithstanding these findings, the analysis is limited to one static-image dataset with no cross-cultural, cross-demographic, and cross-dynamic illustrations. In future research, temporal analysis will be included in the model with hybrid CNN-LSTM architectures, and cross-dataset training, along with domain adaptation, will be utilized to enhance generalization. Besides, the use of privacy-friendly approaches such as federated learning will play a part in enhancing the safe and responsible implementation of FER in different settings. This work provides an advanced FER solution that is implementable in the healthcare and surveillance contexts and human–computer interaction environments.