EFFECTIVE SPEECH EMOTION RECOGNITION USING R-CNN & BLSTM
DOI:
https://doi.org/10.71146/kjmr514Abstract
Speech Emotion Recognition (SER) is gaining significant attention in the field of human-computer interaction (HCI) over past decade. Specially in the fields like health, security, communication, and entertainment. But due to the lack of research on how to boost the speech processing efficiency, the current emotion recognition systems need improvement and more accuracy. To enhance the accuracy, we proposed an Effective Speech Emotion Recognition System (ESERS) which is a hybrid approach that uses Autoencoders (AEs) for denoising and robust feature extraction with a Self-Attentional Convolutional Neural Network–Bidirectional Long Short-Term Memory (CNN-BLSTM) architecture for effective temporal and contextual modeling. Using CREMA Dataset, we achieved Weighted Accuracy (WA) improved from 73.9% to 81.6% and Unweighted Accuracy (UA) increased from 68.5% to 82.8%. which shows absolute improvement of 7.7% and 14.3%, and relative improvements of 10.4% and 20.9% respectively. Hence, to enhance system efficiency, the hybrid approach outperforms traditional approaches currently in use.
Downloads

Downloads
Published
Issue
Section
License
Copyright (c) 2025 Muhammad Hassan Askari, Adeel Shahzad, Ahmed Faraz, Muhammad Fuzail, Naeem Aslam, Mohsin Ali Tariq (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.