Classifying Toxic Comments with Machine Learning and Deep Learning Approaches
DOI:
https://doi.org/10.32628/IJSRST251222664Keywords:
Toxic Comment Detection, Natural Language Processing, ML and DL models, Text Classification, Real-Time Content Moderation, Safe Online CommunicationAbstract
Stepped-up development of online communication has made it essential to identify and remove toxic content to maintain an online environment free from danger. The detection and evaluation of dangerous content found in textual data requires this process. This research explores various ML and DL models for toxic comment classification, and shows comparison of them, which efficiently detects the harmful content such as threats, hate speech, cyberbullying, and offensive language. The study compares various Natural Language Processing (NLP) techniques, procedures starting with tokenization followed by word embeddings then extends to deep learning algorithms such as LSTMs, CNNs, NB and SVM for better classification results. The research also integrates a rule-based keyword detection approach for comparative evaluation. A real-time harmful information identification system works across social media and chat applications as well as meetings because it uses NLP techniques alongside machine learning models. This research analyzed preprocessed harmful comment labels using feature extraction methods on a collection of dataset through experiments that led to model results to support safer digital communication and filtering toxic comments.
Downloads
References
J. Risch and R. Krestel, “Toxic comment detection in online discussions,” Deep learning-based approaches for sentiment analysis, pp. 85– 109, 2020.
S. Kumar and N. Shah, “False information on web and social media: A survey,” arXiv preprint arXiv:1804.08559, 2018.
T. Davidson, D. Warmsley, M. Macy, and I. Weber, “Automated hate speech detection and the problem of offensive language,” in Proceedings of the international AAAI conference on web and social media, vol. 11, pp. 512–515, 2017.
K. Kurita, A. Belova, and A. Anastasopoulos, “Towards robust toxic content classification,” arXiv preprint arXiv:1912.06872, 2019.
M. Mozafari, R. Farahbakhsh, and N. Crespi, “A bert-based transfer learning approach for hate speech detection in online social media,” in Complex Networks and Their Applications VIII: Volume 1 Proceedings of the Eighth International Conference on Complex Networks and Their Applications COMPLEX NETWORKS 2019 8, pp. 928–940, Springer, 2020.
M. Zampieri, S. Malmasi, P. Nakov, S. Rosenthal, N. Farra, and R. Kumar, “Predicting the type and target of offensive posts in social media,” arXiv preprint arXiv:1902.09666, 2019.
D. Patel, P. K. D. Pramanik, C. Suryawanshi, and P. Pareek, “Detecting toxic comments on social media: an extensive evaluation of machine learning techniques,” Journal of Computational Social Science, vol. 8, no. 1, pp. 1–18, 2025.
A. Bonetti, M. Mart´ınez-Sober, J. C. Torres, J. M. Vega, S. Pellerin, and J. Vila-Frances, “Comparison between machine learning and deep learning approaches for the detection of toxic comments on social networks,” Applied Sciences, vol. 13, no. 10, p. 6038, 2023.
D. Q. Nguyen, T. Vu, and A. T. Nguyen, “Bertweet: A pre-trained language model for English tweets,” arXiv preprint arXiv:2005.10200, 2020.
J. Pennington, R. Socher, and C. D. Manning, “Glove: Global vectors for word representation,” in Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pp. 1532– 1543, 2014.
A. Singh, D. Sharma, and V. K. Singh, “Misogynistic attitude detection in youtube comments and replies: A high-quality dataset and algorithmic models,” Computer Speech & Language, vol. 89, p. 101682, 2025.
H. Kajla, J. Hooda, G. Saini, et al., “Classification of online toxic comments using machine learning algorithms,” in 2020 4th international conference on intelligent computing and control systems (ICICCS), pp. 1119–1123, IEEE, 2020.
Z. Zhang, D. Robinson, and J. Tepper, “Detecting hate speech on twitter using a convolution-gru based deep neural network,” in European semantic web conference, pp. 745–760, Springer, 2018.
P. Badjatiya, S. Gupta, M. Gupta, and V. Varma, “Deep learning for hate speech detection in tweets,” in Proceedings of the 26th international conference on World Wide Web companion, pp. 759–760, 2017.
J. Devlin, M.-W. Chang, K. Lee, and K. Toutanova, “Bert: Pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pp. 4171–4186, 2019.
S. Carta, A. Corriga, R. Mulas, D. R. Recupero, and R. Saia, “A supervised multi-class multi-label word embeddings approach for toxic comment classification.,” in KDIR, pp. 105–112, 2019.
V. Maslej-Kresˇnˇakov ´ a, M. Sarnovsk ´ y, P. Butka, and K. Machov ` a,´ “Comparison of deep learning models and various text pre-processing techniques for the toxic comments classification,” Applied Sciences, vol. 10, no. 23, p. 8631, 2020.
P. Ozoh, A. A. Adigun, and M. Olayiwola, “Identification and classification of toxic comments on social media using machine learning techniques,” International Journal of Research and Innovation in Applied Science (IJRIAS), vol. 4, no. 11, pp. 142–147, 2019.
K. Poojitha, A. S. Charish, M. Reddy, and S. Ayyasamy, “Classification of social media toxic comments using machine learning models,” arXiv preprint arXiv:2304.06934, 2023.
M. W. Al Nabki, E. Fidalgo, E. Alegre, and I. De Paz, “Classifying illegal activities on tor network based on web textual contents,” in Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 1, Long Papers, pp. 35–43, 2017.
S. V. Georgakopoulos, S. K. Tasoulis, A. G. Vrahatis, and V. P. Plagianakos, “Convolutional neural networks for toxic comment classification,” in Proceedings of the 10th hellenic conference on artificial intelligence, pp. 1–6, 2018.
M. I. Pavel, R. Razzak, K. Sengupta, M. D. K. Niloy, M. B. Muqith, and S. Y. Tan, “Toxic comment classification implementing cnn combining word embedding technique,” in Inventive Computation and Information Technologies: Proceedings of ICICIT 2020, pp. 897–909, Springer, 2021.
A. Schmidt and M. Wiegand, “A survey on hate speech detection using natural language processing,” in Proceedings of the fifth international workshop on natural language processing for social media, pp. 1–10, 2017.
Downloads
Published
Issue
Section
License
Copyright (c) 2025 International Journal of Scientific Research in Science and Technology

This work is licensed under a Creative Commons Attribution 4.0 International License.