Fusion of Fast-text and Indo-Wordnet for Disambiguation of Word Sense in the Marathi Language

Authors

  • Mr. Aparitosh Gahankari Research Scholar, P.G. Department of CSE, Sant Gadge Baba Amravati University, Amravati, Maahrashtra, India Author
  • Dr. Avinash S. Kapse Professor & HoD, Department of CSE, Mauli College of Engineering & Technology, Shegaon, Maharashtra, India Author
  • Dr. Mohammad Atique HoD, P.G. Department of CSE, Sant Gadge Baba Amravati University, Amravati, Maharashtra, India Author
  • Dr. V.M. Thakare Ex-Professor, P.G. Department of CSE, Sant Gadge Baba Amravati University, Amravati, Maharashtra, India Author
  • Dr. Arvind S. Kapse Professor, ISE, New Horizon College of Engineering, Bengaluru, Karnataka, India Author

DOI:

https://doi.org/10.32628/IJSRST251222653

Keywords:

Word Sense Disambiguation (WSD), Fast-text, Indo-Wordnet

Abstract

This research employs the combination of the FastText model and Indo-WordNet to address the issue of word sense disambiguation (WSD) in Marathi literature. The initial iteration of the algorithm employed word pair matching as the technique to ascertain the presence of overlap between the items in the "context bag" and the "sense bag" derived from the lexical resource WordNet. The current methodology involves the computation of overlap by utilizing a semantic similarity metric that leverages fastText subword embeddings. This approach demonstrates proficiency in effectively managing unanticipated word formations, while simultaneously elucidating the inherent semantics of the terms. Significant progress has been achieved in the field of Word Sense Disambiguation (WSD) for both the English language and many European languages. There is a substantial challenge to be surmounted in relation to Marathi and other languages spoken in India. The Marathi text corpus, sourced from the government of India, comprises a vast assemblage of Marathi sentences. The dataset used in this study consisted of the Indo WordNet for the Marathi language and the Marathi Online Dictionary. The results of the conducted experiments demonstrate promising discoveries. The target words that possess semantically distinct synsets in WordNet are assigned a high F1 score. The achieved F1 score of 89% above the baseline and signifies substantial advancements in compared to previous knowledge-based methodologies employed for low resource Indian languages.

Downloads

Download data is not yet available.

References

M. Lesk,(1986) "Automatic Sense Disambiguation Using Machine Readable Dictionaries: How to Tell a Pine Cone from an Ice Cream Cone," Proceedings of SIGDOC

Ekedahl, J., & Golub, K. (2004). “Word sense disambiguation using Wordnet and the Lesk algorithm. Projektarbeten “

Moro, A., Raganato, A., &Navigli, R. (2014). Entity linking meets word sense disambiguation: a heuristicapproach. Transactions the Association for Computational Linguistics, 2, 231-244.

Y. Heo, S. Kang and J. Seo, "Hybrid Sense Classification Method for Large-Scale Word Sense Disambiguation," in IEEE Access, vol. 8, pp. 27247-27256, 2020, doi: 10.1109/ACCESS.2020.2970436.

Naseer, A., & Hussain, S. (2009). Supervised Word Sense Disambiguation for Urdu Using Bayesian Classification. Center for Research in Urdu Language Processing, Lahore, Pakistan

H. Seo, H. Chung, H. Rim,(2004) S. H. Myaeng and S. Kim, “Unsupervised word sense disambiguation using WordNet relatives,” Computer Speech and Language, Vol. 18, No. 3, Pp. 253-273, 2004

Reddy, S., Inumella, A., McCarthy, D., & Stevenson, M. (2010, July). IIITH: DoDomainspecificord sense disambiguation. In Proceedings of the 5th International Workshop on Semantic Evaluation (pp. 387-391). Association for Computational Linguistics.

Sharma, N., Kumar, S., & Niranjan, S. (2012). Using Machine Learning Algorithms for Word Sense Disambiguation: A Brief Survey. International Journal of Computer Technology and Electronics Engineering (IJCTEE) Volume, 2.

Trivedi, M., Sharma, S., &Deulkar, K. (2014). Approaches To Word Sense Disambiguation. International Journal of Engineering Research & Technology, 3(10), 645-647.

P. Sachdeva, S. Verma and S. K. Singh,(2014) "An improved approach to word sense disambiguation," 2014 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Noida, 2014, pp. 000235-000240, doi: 10.1109/ISSPIT.2014.7300594.

K. Samhith, S. A. Tilak and G. Panda,(2016) "Word sense disambiguation using WordNet Lexical Categories," 2016 International Conference on Signal Processing, Communication, Power and Embedded System (SCOPES), Paralakhemundi, 2016, pp. 1664-1666, doi: 10.1109/SCOPES.2016.7955725.

S. G. Kolte and S. G. Bhirud,(2008) "Word Sense Disambiguation Using WordNet Domains," 2008 First International Conference on Emerging Trends in Engineering and Technology, Nagpur, Maharashtra, 2008, pp. 1187-1191, doi: 10.1109/ICETET.2008.231.

R. Liang, C. Luo, C. Zhang, T. Lei, H. Wang and M. Li,(2019) "Word Sense Disambiguation Based on Semantic Knowledge," 2019 IEEE 2nd International Conference on Electronic Information and Communication Technology (ICEICT), Harbin, China, 2019, pp. 645-648, doi: 10.1109/ICEICT.2019.8846408.

U. Farooq, T. P. Dhamala, A. Nongaillard, Y. Ouzrout and M. A. Qadir, (2015) "A word sense disambiguation method for feature level sentiment analysis," 9th International Implementation of Neighboring Word Feature Technique using CNN for Word Sense Disambiguation for Marathi Language10674 Conference on Software, Knowledge, Information Management and Applications (SKIMA), Kathmandu, 2015, pp. 1-8, doi: 10.1109/SKIMA.2015.7399988.

A. Guerrieri, F. Rahimian, S. Girdzijauskas and A. Montresor,(2016) "Tovel: Distributed Graph Clustering for Word Sense Disambiguation," 2016 IEEE 16th International Conference on Data Mining Workshops (ICDMW), Barcelona, pp. 623-630, doi: 10.1109/ICDMW.2016.0094.

Samuel Sousa Evangelos Milios Faculty of Computer Science Dalhousie University Halifax, Canada eem@cs.dal.ca Institute of Science and Technology Federal University of São Paulo São José dos Campos, Brazil samuel.bruno@unifesp.br “Word sense disambiguation an evaluation study of semi-supervised approaches with word embeddings” 978-1-7281-6926-2/20/$31.00 ©2020 IEEE

Simone Conia Roberto Navigli Sapienza NLP Group Department of Computer Science Sapienza University of Rome “Framing word sense disambiguation as a multi-label problem for model-agnostic knowledge integration”Proceedings of the 16th conference of the European chapter of the association for computational Linguistic ,Pages 3269- 3275 april 19-23 2021 Association for computational linguistic.

Downloads

Published

23-04-2025

Issue

Section

Research Articles