RELU-Reinforcing Emotional Understanding: Advancing Speech Emotion Recognition Through Deep Reinforcement Learning

Palanisamy, K., Singhania, D., & Yao, A. (2020). Rethinking CNN models for audio classification. arXiv preprint arXiv:2007.11154.

Devillers, L., Lamel, L., & Vasilescu, I. (2003, July). Emotion detection in task-oriented spoken dialogues. In 2003 International Conference on Multimedia and Expo. ICME'03. Proceedings (Cat. No. 03TH8698) (Vol. 3, pp. III-549). IEEE.

Tripathi, A., Singh, U., Bansal, G., Gupta, R., & Singh, A. K. (2020, May). A review on emotion detection and classification using speech. In Proceedings of the international conference on innovative computing & communications (ICICC).

Raman, R., Shamim, R., Akram, S. V., Thakur, L., Pillai, B. G., & Ponnusamy, R. (2023, January). Classification and contrast of supervised machine learning algorithms. In 2023 International Conference on Artificial Intelligence and Smart Communication (AISC) (pp. 629–633). IEEE.

Sezgin MC, Gunsel B, Kurt GK. Perceptual audio features for emotion detection. EURASIP Journal on Audio, Speech, and Music Processing. 2012;2012:1–21.

Article Google Scholar

Amer, M. R., Siddiquie, B., Richey, C., & Divakaran, A. (2014, May). Emotion detection in speech using deep networks. In 2014 IEEE international conference on acoustics, speech and signal processing (ICASSP) (pp. 3724–3728). IEEE.

Chen Y, He J. Deep learning-based emotion detection. Journal of Computer and Communications. 2022;10(2):57–71.

Article Google Scholar

Al Machot, F., Mosa, A. H., Dabbour, K., Fasih, A., Schwarzlmüller, C., Ali, M., & Kyamakya, K. (2011, July). A novel real-time emotion detection system from audio streams based on Bayesian quadratic discriminate classifier for ADAS. In Proceedings of the Joint INDS'11 & ISTET'11 (pp. 1–5). IEEE.

Zamil, A. A. A., Hasan, S., Baki, S. M. J., Adam, J. M., & Zaman, I. (2019, January). Emotion detection from speech signals using voting mechanism on classified frames. In 2019 international conference on robotics, electrical and signal processing techniques (ICREST) (pp. 281–285). IEEE.

Bhaskar J, Sruthi K, Nedungadi P. Hybrid approach for emotion classification of audio conversation based on text and speech mining. Procedia Computer Science. 2015;46:635–43.

Article Google Scholar

Wu, Y. T., Li, J. L., & Lee, C. C. (2022, May). An audio-saliency masking transformer for audio emotion classification in movies. In ICASSP 2022–2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) (pp. 4813–4817). IEEE.

Haq, S., Jackson, P. J., & Edge, J. (2008). Audio-visual feature selection and reduction for emotion classification. In Proc. Int. Conf. on Auditory-Visual Speech Processing (AVSP’08), Tangalooma, Australia.

Xu G, Li W, Liu J. A social emotion classification approach using multi-model fusion. Futur Gener Comput Syst. 2020;102:347–56.

Article Google Scholar

Torres, J. M. M., & Stepanov, E. A. (2017, August). Enhanced face/audio emotion recognition: video and instance level classification using ConvNets and restricted Boltzmann Machines. In Proceedings of the International Conference on Web Intelligence (pp. 939–946).

Xu, M., Chia, L. T., & Jin, J. (2005, July). Affective content analysis in comedy and horror videos by audio emotional event detection. In 2005 IEEE International Conference on Multimedia and Expo (pp. 4-pp). IEEE.

Hershey, S., Chaudhuri, S., Ellis, D. P., Gemmeke, J. F., Jansen, A., Moore, R. C., ... & Wilson, K. (2017, March). CNN architectures for large-scale audio classification. In 2017 ieee international conference on acoustics, speech and signal processing (icassp) (pp. 131–135). IEEE.

Andén, J., & Mallat, S. (2011, October). Multiscale scattering for audio classification. In ISMIR (pp. 657–662).

Zhang, T., Li, S., Chen, B., Yuan, H., & Chen, C. P. (2022). AIA-net: adaptive interactive attention network for text–audio emotion recognition. IEEE Transactions on Cybernetics.

Malik, I., Latif, S., Manzoor, S., Usama, M., Qadir, J., & Jurdak, R. (2023). Emotions beyond words: non-speech audio emotion recognition with edge computing. arXiv preprint arXiv:2305.00725.

Cheng M, Tsoi AC. Fractal dimension pattern-based multiresolution analysis for rough estimator of speaker-dependent audio emotion recognition. Int J Wavelets Multiresolut Inf Process. 2017;15(05):1750042.

Article Google Scholar

Knyazev, B., Shvetsov, R., Efremova, N., & Kuharenko, A. (2018, May). Leveraging large face recognition data for emotion classification. In 2018 13th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2018) (pp. 692–696). IEEE.

Liu, G., & Tan, Z. (2020, June). Research on multi-modal music emotion classification based on audio and lyirc. In 2020 IEEE 4th Information Technology, Networking, Electronic and Automation Control Conference (ITNEC) (Vol. 1, pp. 2331–2335). IEEE.

Sezgin, C., Gunsel, B., & Kurt, G. (2010). A new perceptual feature set for audio emotion recognition. Technical Report, Department of Electronic and Communication Engineering, Istanbul Technical University, Turkey.

Venkataramanan, K., & Rajamohan, H. R. (2019). Emotion recognition from speech. arXiv preprint arXiv:1912.10458.

Vielzeuf, V., Pateux, S., & Jurie, F. (2017, November). Temporal multimodal fusion for video emotion classification in the wild. In Proceedings of the 19th ACM International Conference on Multimodal Interaction (pp. 569–576).

Knyazev, B., Shvetsov, R., Efremova, N., & Kuharenko, A. (2017). Convolutional neural networks pretrained on large face recognition datasets for emotion classification from video. arXiv preprint arXiv:1711.04598.

Kong Q, Cao Y, Iqbal T, Wang Y, Wang W, Plumbley MD. Panns: large-scale pretrained audio neural networks for audio pattern recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2020;28:2880–94.

Article Google Scholar

Kim D, Kang P. Cross-modal distillation with audio–text fusion for fine-grained emotion classification using BERT and Wav2vec 2.0. Neurocomputing. 2022;506:168–83.

Article Google Scholar

Wu Y, Mao H, Yi Z. Audio classification using attention-augmented convolutional neural network. Knowl-Based Syst. 2018;161:90–100.

Article Google Scholar

Zhang, S., Zhang, S., Huang, T., & Gao, W. (2016, June). Multimodal deep convolutional neural network for audio-visual emotion recognition. In Proceedings of the 2016 ACM on International Conference on Multimedia Retrieval (pp. 281–284).

Xie Y, Liang R, Liang Z, Huang C, Zou C, Schuller B. Speech emotion classification using attention-based LSTM. IEEE/ACM Transactions on Audio, Speech, and Language Processing. 2019;27(11):1675–85.

Article Google Scholar

Pokhrel, Sangita, Bina KC, and Prashant Bikram Shah. “A practical application of retrieval-augmented generation for website-based chatbots: combining web scraping, vectorization, and semantic search.” Journal of Trends in Computer Science and Smart Technology 6, no. 4 (2025): 424–442.

Jyothis, Siva, Anand S. Nair, Kirandas VR, and Divya Visakh. “Developing a SmartRail security system with YOLO and OpenCV.” Journal of Ubiquitous Computing and Communication Technologies 6, no. 1 (2024): 28–38.

Rouabhi S, Tlemsani R, Neggaz N. Real-time mobile application for Arabic sign alphabet recognition using pre-trained CNN. Soft Comput. 2024;28:12991–3008. https://doi.org/10.1007/s00500-024-10305-0.

Article Google Scholar

Rouabhi S, Azerine A, Tlemsani R, et al. Conv-ViT fusion for improved handwritten Arabic character classification. SIViP. 2024;18(Suppl 1):355–72. https://doi.org/10.1007/s11760-024-03158-5.

Article Google Scholar

Patwardhan, A. S. (2017, October). Multimodal mixed emotion detection. In 2017 2nd international conference on communication and electronics systems (ICCES) (pp. 139–143). IEEE.

Puri, T., Soni, M., Dhiman, G., Ibrahim Khalaf, O., & Raza Khan, I. (2022). Detection of emotion of speech for RAVDESS audio using hybrid convolution neural network. Journal of Healthcare Engineering, 2022.

Alonso-Martin F, Malfaz M, Sequeira J, Gorostiza JF, Salichs MA. A multimodal emotion detection system during human–robot interaction. Sensors. 2013;13(11):15549–81.

Article Google Scholar

Lu, L., Jiang, H., & Zhang, H. (2001, October). A robust audio classification and segmentation method. In Proceedings of the ninth ACM international conference on Multimedia (pp. 203–211).

View original article

COGNITIVE COMPUTATION

Like

Share Bookmark

0 0 0 0 0 0 0

More from this channel

RELU-Reinforcing Emotional Understanding: Advancing Speech Emotion Recognition Through Deep Reinforcement Learning

Comments (0)