Speech Based Emotion Recognition Using Machine Learning
DOI:
https://doi.org/10.32628/IJSRST229168Keywords:
Speech Emotion Recognition, Human Computer Interaction, Mel Frequency cepstrum coefficients, Support Vector MachineAbstract
Emotion is a natural feeling which is distinguished from reasoning or knowledge, it is a strong feeling derived from one’s circumstance or surroundings. With the increase in man to machine interaction, speech analysis has become an integral part in reducing the gap between physical and digital world. An important sub field within this domain is the recognition of emotion in speech signals, which was traditionally studied in linguistics and psychology. Speech emotion recognition is a field having diverse applications. When implemented the Speech Emotion Recognition (SER) will be able to understand different human emotion such as anger, fear, happiness, sadness etc. Speech is a medium of expression of one’s perspective or feelings to other. Emotion recognition from audio signal requires feature extraction and classifier training. The feature vector consists of elements of the audio signal which characterize speaker specific features such as tone, pitch, energy, which is crucial to train the classifier model to recognize a particular emotion accurately. Thus, with the help of SER we can make conversations between human and computer more realistic and natural. Automatic Speech Emotion Recognition (SER) is a current research topic in the field of Human Computer Interaction (HCI) with wide range of applications. The speech features such as, Mel Frequency cepstrum coefficients (MFCC) and Mel Energy Spectrum Dynamic Coefficients (MEDC) are extracted from speech utterance. The Support Vector Machine (SVM) is used as classifier to classify different emotional states such as anger, happiness, sadness, neutral, fear, from Berlin emotional database.
References
- M. S. Likitha, S. R. R. Gupta, K. Hasitha and A. U. Raju, "Speech based human emotion recognition using MFCC." 2017 International Conference on Wireless Communications, Signal Processing and Networking (WiSPNET), Chennai, 2017, pp. 2257-2260.Christopher. J. C. Burges, A tutorial on support vector machines for pattern recognition, DataMining and Knowledge Discovery, 2(2):955-974, Kluwer Academic Publishers, Boston, 1998.
- Cowie, R., Douglas-Cowie, E., Tsapatsoulis, N., Votsis, G., Kollias, S., Fellenz, W., and Taylor, J. G., Emotion recognition in human-computer interaction, IEEE Signal Processing magazine, Vol. 18, No. 1, 32-80, Jan. 2001.
- Chih-Chung Chang and Chih-Jen Lin, LIBSVM: a library for support vector machines, 2001.
- M. D. Skowronski and J. G. Harris, Increased MFCC Filter Bandwidth for Noise-Robust Phoneme Recognition, Proc. ICASSP-02, Florida, May 2002.
- Fuhai Li, Jinwen Ma, and Dezhi Huang, MFCC and SVM based recognition of Chinese vowels, Lecture Notes in Artificial Intelligence, vol.3802, 812-819, 2005
- Chul Min Lee, and Shrikanth S. Narayanan, “Toward detecting. emotions in spoken dialogs”, IEEE Transaction on Speech and . Audio Processing,, vol. 13, no. 2, pp. 293- 303,Mar. 2005.
- Burkhardt, Felix; Paeschke, Astrid; Rolfes, Miriam; Sendlmeier, Walter F.; Weiss, Benjamin A Database of German Emotional Speech. Proceedings of Interspeech, Lissabon, Portugal. 2005
- Kamran Soltani, Raja Noor Ainon, “SPEECH EMOTION DETECTION BASED ON NEURAL NETWORKS”, IEEE International Symposium on Signal Processing and Its Applications, ISSPA 2007.
Downloads
Published
Issue
Section
License
Copyright (c) IJSRST

This work is licensed under a Creative Commons Attribution 4.0 International License.