EFFECTIVE SPEECH EMOTION RECOGNITION USING R-CNN & BLSTM

Muhammad Hassan Askari; Adeel Shahzad; Ahmed Faraz; Muhammad Fuzail; Naeem Aslam; Mohsin Ali Tariq

doi:10.71146/kjmr514

Authors

Muhammad Hassan Askari Department of Computer Science, NFC Institute of Engineering and Technology, Multan, Pakistan. Author
Adeel Shahzad Department of Computer Science, Virtual University of Pakistan. Author
Ahmed Faraz Department of Computer Science, University of Lahore, Pakistan. Author
Muhammad Fuzail Department of Computer Science, NFC Institute of Engineering and Technology, Multan, Pakistan. Author
Naeem Aslam Department of Computer Science, NFC Institute of Engineering and Technology, Multan, Pakistan. Author
Mohsin Ali Tariq Department of Computer Science, NFC Institute of Engineering and Technology, Multan, Pakistan. Author

DOI:

https://doi.org/10.71146/kjmr514

Abstract

Speech Emotion Recognition (SER) is gaining significant attention in the field of human-computer interaction (HCI) over past decade. Specially in the fields like health, security, communication, and entertainment. But due to the lack of research on how to boost the speech processing efficiency, the current emotion recognition systems need improvement and more accuracy. To enhance the accuracy, we proposed an Effective Speech Emotion Recognition System (ESERS) which is a hybrid approach that uses Autoencoders (AEs) for denoising and robust feature extraction with a Self-Attentional Convolutional Neural Network–Bidirectional Long Short-Term Memory (CNN-BLSTM) architecture for effective temporal and contextual modeling. Using CREMA Dataset, we achieved Weighted Accuracy (WA) improved from 73.9% to 81.6% and Unweighted Accuracy (UA) increased from 68.5% to 82.8%. which shows absolute improvement of 7.7% and 14.3%, and relative improvements of 10.4% and 20.9% respectively. Hence, to enhance system efficiency, the hybrid approach outperforms traditional approaches currently in use.