Enhanced Speech Emotion Recognition with Gender Information Using CNN and Random Forest Classification
Keywords:
Affective Computing, Speech Emotion Recognition, Gender Classifier, Deep Learning, Interpretability, Random Forest, Residual CNNAbstract
Recent advancements in speech emotion recognition (SER) have primarily centered on effective feature selection from acoustic data. This study introduces a novel SER algorithm that leverages raw speech data combined with gender information to enhance recognition accuracy, eliminating the need for manually selected acoustic features. Our approach integrates a Residual Convolutional Neural Network (R-CNN) model to detect emotions directly from raw speech signals and a Random Forest classifier to determine speaker gender. The R-CNN model processes the raw audio, extracting emotional cues for accurate classification without relying on pre-selected acoustic features, thus capturing subtle emotion-driven nuances that traditional methods may overlook. Simultaneously, the Random Forest classifier processes speech data to identify the speaker’s gender, providing contextual information that strengthens the emotion recognition process. Evaluated across three public datasets in multiple languages, the proposed model demonstrates a notable improvement in accuracy and interpretability by leveraging both emotion and gender information. This approach highlights the benefits of a dual-model framework that combines deep learning and ensemble methods, pushing the boundaries of affective computing through a more holistic understanding of speech data.
Downloads
References
M. B. Akçay and K. J. S. C. Oğuz, "Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers," vol. 116, pp. 56-76, 2020.
M. W. Bhatti, Y. Wang, and L. Guan, "A neural network approach for human emotion recognition in speech," in 2004 IEEE International Symposium on Circuits and Systems (ISCAS), 2004, vol. 2, pp. II-181: IEEE.
T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kartiwi, and E. J. I. a. Ambikairajah, "A comprehensive review of speech emotion recognition systems," vol. 9, pp. 47795-47814, 2021.
D. Issa, M. F. Demirci, A. J. B. S. P. Yazici, and Control, "Speech emotion recognition with deep convolutional neural networks," vol. 59, p. 101894, 2020.
K. K. Kishore and P. K. Satish, "Emotion recognition in speech using MFCC and wavelet features," in 2013 3rd IEEE International Advance Computing Conference (IACC), 2013, pp. 842-847: IEEE.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Science and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.