Enhanced Speech Emotion Recognition with Gender Information Using CNN and Random Forest Classification

A. Hemabhushana; S. Noortaj

Authors

A. Hemabhushana MCA Student, KMM Institute of Post Graduate Studies, Tirupati, Tirupati(D.t), Andhra Pradesh, India Author
S. Noortaj Assistant Professor, KMM Institute of Post Graduate Studies, Tirupati, Tirupati(D.t), Andhra Pradesh, India Author

Keywords:

Affective Computing, Speech Emotion Recognition, Gender Classifier, Deep Learning, Interpretability, Random Forest, Residual CNN

Abstract

Recent advancements in speech emotion recognition (SER) have primarily centered on effective feature selection from acoustic data. This study introduces a novel SER algorithm that leverages raw speech data combined with gender information to enhance recognition accuracy, eliminating the need for manually selected acoustic features. Our approach integrates a Residual Convolutional Neural Network (R-CNN) model to detect emotions directly from raw speech signals and a Random Forest classifier to determine speaker gender. The R-CNN model processes the raw audio, extracting emotional cues for accurate classification without relying on pre-selected acoustic features, thus capturing subtle emotion-driven nuances that traditional methods may overlook. Simultaneously, the Random Forest classifier processes speech data to identify the speaker’s gender, providing contextual information that strengthens the emotion recognition process. Evaluated across three public datasets in multiple languages, the proposed model demonstrates a notable improvement in accuracy and interpretability by leveraging both emotion and gender information. This approach highlights the benefits of a dual-model framework that combines deep learning and ensemble methods, pushing the boundaries of affective computing through a more holistic understanding of speech data.

📊 Article Downloads

References

M. B. Akçay and K. J. S. C. Oğuz, "Speech emotion recognition: Emotional models, databases, features, preprocessing methods, supporting modalities, and classifiers," vol. 116, pp. 56-76, 2020.

M. W. Bhatti, Y. Wang, and L. Guan, "A neural network approach for human emotion recognition in speech," in 2004 IEEE International Symposium on Circuits and Systems (ISCAS), 2004, vol. 2, pp. II-181: IEEE.

T. M. Wani, T. S. Gunawan, S. A. A. Qadri, M. Kartiwi, and E. J. I. a. Ambikairajah, "A comprehensive review of speech emotion recognition systems," vol. 9, pp. 47795-47814, 2021.

D. Issa, M. F. Demirci, A. J. B. S. P. Yazici, and Control, "Speech emotion recognition with deep convolutional neural networks," vol. 59, p. 101894, 2020.

K. K. Kishore and P. K. Satish, "Emotion recognition in speech using MFCC and wavelet features," in 2013 3rd IEEE International Advance Computing Conference (IACC), 2013, pp. 842-847: IEEE.

Enhanced Speech Emotion Recognition with Gender Information Using CNN and Random Forest Classification

Authors

Keywords:

Abstract

📊 Article Downloads

References

Downloads

Published

Issue

Section

License

How to Cite

RightSideBlock

IssueDate

Latest publications