Enhanced Speech Emotion Recognition with Gender Information Using CNN and Random Forest Classification

Authors

  • A. Hemabhushana MCA Student, KMM Institute of Post Graduate Studies, Tirupati, Tirupati(D.t), Andhra Pradesh, India Author
  • S. Noortaj Assistant Professor, KMM Institute of Post Graduate Studies, Tirupati, Tirupati(D.t), Andhra Pradesh, India Author

Keywords:

Affective Computing, Speech Emotion Recognition, Gender Classifier, Deep Learning, Interpretability, Random Forest, Residual CNN

Abstract

Recent advancements in speech emotion recognition (SER) have primarily centered on effective feature selection from acoustic data. This study introduces a novel SER algorithm that leverages raw speech data combined with gender information to enhance recognition accuracy, eliminating the need for manually selected acoustic features. Our approach integrates a Residual Convolutional Neural Network (R-CNN) model to detect emotions directly from raw speech signals and a Random Forest classifier to determine speaker gender. The R-CNN model processes the raw audio, extracting emotional cues for accurate classification without relying on pre-selected acoustic features, thus capturing subtle emotion-driven nuances that traditional methods may overlook. Simultaneously, the Random Forest classifier processes speech data to identify the speaker’s gender, providing contextual information that strengthens the emotion recognition process. Evaluated across three public datasets in multiple languages, the proposed model demonstrates a notable improvement in accuracy and interpretability by leveraging both emotion and gender information. This approach highlights the benefits of a dual-model framework that combines deep learning and ensemble methods, pushing the boundaries of affective computing through a more holistic understanding of speech data.

Downloads

Download data is not yet available.

References

M. B. Akçay and K. J. S. C. Oğuz, "Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers," vol. 116, pp. 56-76, 2020.

M. W. Bhatti, Y. Wang, and L. Guan, "A neural network approach for human emotion recognition in speech," in 2004 IEEE International Symposium on Circuits and Systems (ISCAS), 2004, vol. 2, pp. II-181: IEEE.

T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kartiwi, and E. J. I. a. Ambikairajah, "A comprehensive review of speech emotion recognition systems," vol. 9, pp. 47795-47814, 2021.

D. Issa, M. F. Demirci, A. J. B. S. P. Yazici, and Control, "Speech emotion recognition with deep convolutional neural networks," vol. 59, p. 101894, 2020.

K. K. Kishore and P. K. Satish, "Emotion recognition in speech using MFCC and wavelet features," in 2013 3rd IEEE International Advance Computing Conference (IACC), 2013, pp. 842-847: IEEE.

Downloads

Published

26-05-2025

Issue

Section

Research Articles