An Enhanced Human Speech Based Emotion Recognition
DOI:
https://doi.org/10.32628/IJSRST24113128Keywords:
Speech Emotion Recognition, Machine Learning, Emotional Computing,, MFCC, Actors audio datasetAbstract
Speech Emotion Recognition (SER) is a Machine Learning (ML) topic that has attracted substantial attention from researchers, particularly in the field of emotional computing. This is because of its growing potential, improvements in algorithms, and real-world applications. Pitch, intensity, and Mel-Frequency Cepstral Coefficients (MFCC) are examples of quantitative variables that can be used to represent the paralinguistic information found in human speech. The three main processes of data processing, feature selection/extraction, and classification based on the underlying emotional traits are typically followed to achieve SER. The use of ML techniques for SER implementation is supported by the nature of these processes as well as the unique characteristics of human speech. Several ML techniques were used in recent affective computing research projects for SER tasks; Only a few number of them, nevertheless, adequately convey the fundamental strategies and tactics that can be applied to support the three essential phases of SER implementation. Additionally, these works either overlook or just briefly explain the difficulties involved in completing these tasks and the cutting-edge methods employed to overcome them. With a focus on the three SER implementation processes, we give a comprehensive assessment of research conducted over the past ten years that tackled SER challenges from machine learning perspectives in this study. A number of difficulties are covered in detail, including the problem of Speaker-Independent experiments' low classification accuracy and related solutions. The review offers principles for SER evaluation as well, emphasizing indicators that can be experimented with and common baselines. The purpose of this paper is to serve as a a thorough manual that SER researchers may use to build SER solutions using ML techniques, inspire potential upgrades to current SER models, or spark the development of new methods to improve SER performance.
Downloads
References
Girija Deshmukh, Apurva Gaonkar, Gauri Golwalkar, Sukanya Kulkarni, “Speech based Emotion Recognition using Machine Learning”, Institute of Electrical And Electronics Engineers, Mar. 2019. DOI: https://doi.org/10.1109/ICCMC.2019.8819858
Peng Shi, "Speech Emotion Recognition Based on Deep Belief Network", Institute Of Electrical And Electronics Engineers, March 2018. DOI: https://doi.org/10.1109/ICNSC.2018.8361376
J. Umamaheswari, A. Akila, "An Enhanced Human Speech Emotion Recognition Using Hybrid of PRNN and KNN", Institute Of Electrical And Electronics Engineers, Feb 2019. DOI: https://doi.org/10.1109/COMITCon.2019.8862221
Sri Raksha R. Gupta, M.S. Likitha, A. Upendra Raju and K. Hasitha “Speech Based Human Emotion Recognition Using MFCC”, Institute Of Electrical And Electronics Engineers, March 2017.
Tian Kexin, Huang Yongming, Zhang Guobao, Zhang Lin, "Research on Emergency Parking Instruction Recognition Based on Speech Recognition and Speech Emotion Recognition", Institute Of Electrical And Electronics Engineers, Nov. 2019. DOI: https://doi.org/10.1109/CAC48633.2019.8997077
Ye Sim Ülgen Sonmez, Asaf Varol, "New Trends in Speech Emotion Recognition", Institute Of Electrical And Electronics Engineers, June 2019. DOI: https://doi.org/10.1109/ISDFS.2019.8757528
Esther Ramdinmawii, AbhijitMohanta, Vinay Kumar Mittal, "Emotion recognition from speech signal", Institute Of Electrical And Electronics Engineers, Nov. 2017. DOI: https://doi.org/10.1109/TENCON.2017.8228105
Ruhul Amin Khalil, Edward Jones, Mohammad Inayatullah Babar, Tariqullah Jan, Mohammad Haseeb Zafar, And Thamer Alhussain, "Speech Emotion Recognition Using Deep Learning Techniques: A Review", Institute Of Electrical And Electronics Engineers, Aug. 2019.
Michael Neumann, Ngoc Thang Vu, "Improving Speech Emotion Recognition with Unsupervised Representation Learning on Un labeled Speech", Institute of Electrical and Electronics Engineers, May 2019. DOI: https://doi.org/10.1109/ICASSP.2019.8682541
PavolHarár, RadimBurget, Malay Kishore Dutta, "Speech emotion recognitiowith deep learning", Institute Of Electrical And Electronics Engineers, Feb. 2017. DOI: https://doi.org/10.1109/SPIN.2017.8049931
Iqbal, A., & Barua, K. (2019). A Real-time Emotion Recognition from Speech using Gradient Boosting.2019 International Conference on Electrical, Computer and Communication Engineering (ECCE), 1-5 DOI: https://doi.org/10.1109/ECACE.2019.8679271
Koduru, A., Valiveti , H. B., &Budati, A. K. (2020). Feature extraction algorithmsto improve the speech emotion recognition rate. International Journal of Speech Technology, 23(1), 45-55. DOI: https://doi.org/10.1007/s10772-020-09672-4
Veerappa , B. (2020). SPEECH EMOTION RECOGNITION. International Research Journal of Engineering and technology.
Speech Emotion Recognition Using CNN Speech Emotion Recognition Using Convolutional Neural Network (CNN) View project Fire Safety in Indian Coal Mines using Machine Learning Techniques View project Harini Murugan SRMIST. (n.d.).
Kumar, A., & Iqbal, J. L. M. (2019). Machine Learning Based Emotion Recognition using Speech Signal. International Journal of Engineering and Advanced Technology (IJEAT), 9, 2249-8958. DOI: https://doi.org/10.35940/ijeat.A1068.1291S519
Cai, L., Hu, Y., Dong, J., & Zhou, S. (2019). Audio-Textual Emotion Recognition Based on Improved Neural Networks. Mathematical Problems in Engineering,2019. DOI: https://doi.org/10.1155/2019/2593036
.Suksri , S. (n.d.). Speech Recognition using MFCC Arm Support for Rehabilitation View project.
.Kerkeni , L, Serrestou , Y., Mbarki , M., Raoof, K., Ali Mahjoub, M., &Cleder, C. (2020). Automatic Speech Emotion Recognition Using Machine Learning. In Social Media and Machine Learning. Intech Open. DOI: https://doi.org/10.5772/intechopen.84856
Uday Kiran. (2021) MFCC Technique for Speech Recognition, https://www.analyticsvidhya.com/blog/2021/06/mfcc-technique-for-speech Recognition.
Elaine Rodrigues Ribeiro, André Luiz Cunha. (2020), Historical traffic flow data reconstruction applying Wavelet Transform .VenkatMarkapuri , George La Vessi, Robert Stewart, Dan Wagner. (2020)
Khalil, R. A., Jones, E., Babar, M. L, Jan, T., Zafar, M. H..; Alhussain, T. (2019). Speech Emotion Recognition Using Deep Learning Techniques DOI: https://doi.org/10.1109/ACCESS.2019.2936124
Dolka, H., M. A. X. v, Juliet, S. (2021). Speech Emotion Recognition Using ANN on MFCC Features. DOI: https://doi.org/10.1109/ICSPC51351.2021.9451810
Downloads
Published
Issue
Section
License
Copyright (c) 2024 International Journal of Scientific Research in Science and Technology
This work is licensed under a Creative Commons Attribution 4.0 International License.