Classification of Gujarati Articles using Bernoulli Naïve Bayes Classifier and Extra-trees Classifier

Authors

  • Ravirajsinh Chauhan  P P Savani University, Surat, Gujarat, India
  • Janvi R Savani  P P Savani University, Surat, Gujarat, India
  • Janvi M Sheta  P P Savani University, Surat, Gujarat, India

DOI:

https://doi.org/10.32628/IJSRST523103105

Keywords:

Classification, Gujarati Articles, Natural Language Processing, Classifiers

Abstract

On the internet, information technology generated massive amounts of data. Because this data was initially primarily in English, the majority of data mining research was conducted on English text documents. As internet usage grew, so did data in other languages such as Gujarati, Marathi, Tamil, Telugu, and Punjabi, among others. We present a text categorization method based on artificial text summarization of Gujarati Articles in this paper. For the classification of text documents, various learning techniques such as Naïve Bayes, Support Vector Machines, and Decision Trees are available. We gathered articles from various e-newspaper editorials. This paper focuses on a brief review of the various techniques and methods for Gujarati Articles Classification, so that research in Text Classification can be further explored using various classifier algorithms. The dataset, which contains 1604 documents from 8 different categories, is used by the system. The result shows that Stacking Classifier with Bernoulli Naïve Bayes Classifier and Extra-trees Classifier is efficient for Gujarati Articles.

References

  1. Y. Yang, “An evaluation of statistical approaches to text categorization,” Journal of Information Retrieval, Vol. 1, Number 1-2, pp. 69--90, 1999.
  2. Rachidi, Tajje-eddine & Iraqi, Omar & Bouzoubaa, M. & Khattab, A.B.E. & Kourdi, M.E. & Zahi, Abdelali & Bensaid, A. (2003). Barq: distributed multilingual internet search engine with focus on Arabic language. 1. 428 - 435 vol.1. 10.1109/ICSMC.2003.1243853..
  3. D. Lewis, M. Ringnette, “Comparison of two learning algorithms for text categorization,” Proceedings of the Third Annual Symposium on Document Analysis and Information Retrieval (SDAIR'94), 1994.
  4. R. H. Creecy, B. M. Masand, S. J. Smith, and D. L. Waltz, “Trading mips and memory for knowledge engineering,” Communication of the ACM, Vol. 35, No. 8, pp. 48--64, August 1992.
  5. (Wiene and Pedersen, 1995) E. Wiener, J. O. Pedersen, and A. S. Zeigend, “A neural network approach to topic spotting,” Proceedings of the Fourth Symposium on Document Analysis and Information Retrieval (SDAIR'95), 1995.
  6. W. Hadi, Q. A. Al-Radaideh, S. Alhawari, “Integrating Associative Rule-based Classification with Naïve Bayes for Text Classification,” Applied Soft Computing, 69, pp. 344-356, 2018.
  7. S. Boukil, M. Biniz, F. El Adnani, L. Cherrat, and A. E. El Moutaouakkil, “Arabic Text Classification Using Deep Learning Technics,” International Journal of Grid and Distributed Computing,11(9), pp. 103-114, 2018.
  8. Singh, Mandeep, et al. "Performance of bernoulli’s naive bayes classifier in the detection of fake news." Materials Today: Proceedings (2020).
  9. Dey, A., Rafi, R. Z., Parash, S. H., Arko, S. K., & Chakrabarty, A. (2018, June). Fake news pattern recognition using linguistic analysis. In 2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR) (pp. 305-309). IEEE.
  10. Chand, N., Mishra, P., Krishna, C. R., Pilli, E. S., & Govil, M. C. (2016, April). A comparative analysis of SVM and its stacking with other classification algorithm for intrusion detection. In 2016 International Conference on Advances in Computing, Communication, & Automation (ICACCA)(Spring) (pp. 1-6). IEEE.
  11. L. Abhishek, "Optical Character Recognition using Ensemble of SVM, MLP and Extra Trees Classifier," 2020 International Conference for Emerging Technology (INCET), Belgaum, India, 2020, pp. 1-4, doi: 10.1109/INCET49848.2020.9154050.
  12. Kim, Sang-Bum, et al. "Some effective techniques for naive bayes text classification." IEEE transactions on knowledge and data engineering 18.11 (2006): 1457-1466.
  13. Kotsiantis, Sotiris B., Dimitris Kanellopoulos, and Panagiotis E. Pintelas. "Data preprocessing for supervised leaning." International journal of computer science 1.2 (2006): 111-117.
  14. Sundus, Katrina, Fatima Al-Haj, and Bassam Hammo. "A deep learning approach for arabic text classification." 2019 2nd International Conference on New Trends in Computing Sciences (ICTCS). IEEE, 2019.
  15. Narayanan, Vivek, Ishan Arora, and Arjun Bhatia. "Fast and accurate sentiment classification using an enhanced Naive Bayes model." Intelligent Data Engineering and Automated Learning–IDEAL 2013: 14th International Conference, IDEAL 2013, Hefei, China, October 20-23, 2013. Proceedings 14. Springer Berlin Heidelberg, 2013.
  16. Patil, Meera, and Pravin Game. "Comparison of Marathi text classifiers." International Journal on Information Technology 4.1 (2014): 11.
  17. Murthy, Vishnu G., et al. "A comparative study on term weighting methods for automated Telugu text categorization with effective classifiers." International Journal of Data Mining & Knowledge Management Process 3.6 (2013): 95.
  18. Krail, Nidhi, and Vishal Gupta. "Domain based classification of Punjabi text documents using ontology and hybrid-based approach." Proceedings of the 3rd Workshop on south and Southeast Asian natural language processing. 2012.
  19. Ibrishimova, Marina Danchovsky, and Kin Fun Li. "A machine learning approach to fake news detection using knowledge verification and natural language processing." Advances in Intelligent Networking and Collaborative Systems: The 11th International Conference on Intelligent Networking and Collaborative Systems (INCoS-2019). Springer International Publishing, 2020.

Downloads

Published

2023-06-30

Issue

Section

Research Articles

How to Cite

[1]
Ravirajsinh Chauhan, Janvi R Savani, Janvi M Sheta "Classification of Gujarati Articles using Bernoulli Naïve Bayes Classifier and Extra-trees Classifier" International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011,Volume 10, Issue 3, pp.531-540, May-June-2023. Available at doi : https://doi.org/10.32628/IJSRST523103105