Rule Based Part-of-Speech Tagger for Marathi Language

Authors

  • Gaikwad Deepali K.  Department C.S. and I.T, Dr. Babasaheb Ambedkar Marathawada University, Aurangabad, Maharashtra, India
  • Naik Ramesh R.   Department C.S. and I.T, Dr. Babasaheb Ambedkar Marathawada University, Aurangabad, Maharashtra, India
  • C. Namrata Mahender  Department C.S. and I.T, Dr. Babasaheb Ambedkar Marathawada University, Aurangabad, Maharashtra, India

Keywords:

Part of Speech (POS), Tokenization, Stemmer, Morphological Analyzer, Tag Generation.

Abstract

A part of speech (POS) tagging is one of the best studied problems in the field of Natural Language Processing (NLP). POS tagging is the process of assigning a part-of-speech like noun, verb, adjective, adverb to each word in a sentence. In this paper we present a Marathi part of speech tagger. It is morphologically rich language. It is spoken by the native people of Maharashtra. POS tagging is difficult for Marathi language due to unavailability of corpus for computational processing. In this paper, a POS Tagger for Marathi language using Rule based technique is presented. Our proposed system which tokenizes the string into tokens, find root word using morphological analyzer and compare the root word with the WordNet to assign appropriate tag. If word has assigned more than one tags then by using Marathi grammar rules ambiguity is removed. Meaningful rules are provided to improve the performance of the system.

References

  1. Singh Jyoti. Joshi Nisheeth and Mathur Iti. 2013. Part of Speech Tagging of Marathi Text using Trigram Method. International Journal of Advanced Information Technology (IJAIT). Vol. 3. No.2.
  2. Govilkar Sharvari. Bakal J. W and Rathod Shubhangi. 2015. Part of Speech Tagger for Marathi Language. International Journal of Computer Applications. Volume 119-No.18.
  3. Gaikwad Deepali. 2017. Rule Based Text Summarization for Marathi Text. M.Phil. Thesis. Dr. Babasaheb Ambedkar Marathwada University, Aurangabad. India.
  4. Awasthi P. Delip Rao and RAvindran B. 2006. Part of Speech Tagging and Chunking with HMM and CRF. In Proceedings of NLPAI Machine Learning Workshop on Part of Speech Tagging and Chunking for Indian Languages. IIIT Hyderabad, India.
  5. Baskaran S. 2006. Hindi Part of Speech Tagging and Chunking. In Proceedings of NLPAI Machine Learning Workshop on Part of Speech Tagging and Chunking for Indian Languages. IIIT Hyderabad, India.
  6. Agrawal H. And Mani. 2006. Part of Speech Tagging and Chunking with Conditional Random Fields. In Proceedings of NLPAI Machine Learning Workshop on Part of Speech Tagging and Chunking for Indian Languages. IIIT Hyderabad, India.
  7. Pattabhi RKR. SundarRam RV. Krishna RV And Sobha L. 2007. A Text Chunker and Hybrid POS Tagger for Indian Laguages. In Prceedings of International Joint Conference on Artificial Intelligence Workshop on Shallow Parsing for South Asian Languages. IIIT Hyderabad, India.
  8. Hasan Fahim Muhammad. Zaman Naushad Uz and Khan Mumit. 2007. Comparison of Unigram, Bigram, HMM and Brill’s POS Tagging Approaches for some South Asian Languages. In proceeding of Center for Research on Bangla Language Processing.
  9. Dalal Aniket. Kumar Nagraj. Sawant Uma. Shelke Sandeep and Bhattacharyya Pushpak, 2007. Building Feature Rich POS Tagger for Morphologically Rich Languages: Experiences in Hindi. In Proceedings of International Conference on Natural Language Processing (ICON).
  10. Ekbal A. and Mandal S. 2007. POS Tagging using HMM and Rule based Chunking. In Prceedings of International Joint Conference on Artificial Intelligence Workshop on Shallow Parsing for South Asian Languages. IIIT Hyderabad, India.
  11. Patel Chirag and Gali Karthik. 2008. Part-Of-Speech Tagging for Gujarati Using Conditional Random Fields. In Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages. pp 117-122.
  12. Singh Thoudam Doren and Bandyopadhyay Sivaji. 2008. Morphology Driven Manipuri POS Tagger. Proceedings of the IJCNLP-08 Workshop on NLP for Less Privileged Languages. pages 91–98. Hyderabad, India.
  13. Dhanalakshmi V. Anandkumar M. Rajendran S and Soman K P. 2009. Tamil POS Tagging using Linear Programming. in proceeding of International Journal of Recent Trends in Engineering. Vol. 1. No. 2.
  14. Ekbal Asif and Bandyopadhyay Shivaji. 2008. Web-based Bengali News Corpus for Lexicon Development and POS Tagging. In Proceeding of Language Resource and Evaluation.
  15. Manju K. Soumya S. and Idicul S.M. 2009. A Development of A POS Tagger for Malayalam-An Experience. In Proceedings of International Conference on Advances in Recent Techno logies inCommunication and Computing.
  16. J Antony P. And P Soman K. 2011. Parts of Speech Tagging for Indian Laguages: A Literature Survey. International Journal of Computer Applications. Vol. 34- No. 8.
  17. Gaikwad Deepali K. Sawane Deepali and C. Namrata Mahender. 2017. Rule Based Question Generation for Marathi Text Summarization using Rule Based Stemmer IOSR Journal of Computer Engineering (IOSR-JCE). Volume 3.pp 51-54.
  18. Patil H.B. Patil A.S and Pawar B.V. 2014. Part-of-Speech Tagger for Marathi Language using Limited Training Corpora. International Journal of Computer Applications.
  19. Bagul Pallavi. Mishra Archana. et.al. 2014. Rule Based POS Tagger for Marathi Text. (IJCSIT) International Journal of Computer Science and Information Technologies. Vol. 5 (2). 1322-1326.
  20. Joshi Nisheeth. Darbari Hemant and Mathur Iti. 2013. HMM based POS Tagger for Hindi. In Proceeding of 2013 International Conference on Artificial Intelligence and Soft Computing.

Downloads

Published

2018-04-30

Issue

Section

Research Articles

How to Cite

[1]
Gaikwad Deepali K., Naik Ramesh R. , C. Namrata Mahender, " Rule Based Part-of-Speech Tagger for Marathi Language, International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 4, Issue 5, pp.1607-1612, March-April-2018.