Analysis and Implementation of Text Mining for Different Documents

Authors

  • K. Maheswari  Department of Computer Applications, Kalasalingam University, Krishnankoil, Tamil Nadu, India
  • P. Packia Amutha Priya   Department of Computer Applications, Kalasalingam University, Krishnankoil, Tamil Nadu, India

Keywords:

Text Mining, Data Mining, frequency of words and text file

Abstract

The process of making structured data from unstructured and semi structured text is called text mining. Text mining is defined as bag of words. The environment is set up with various documents in a database. The preprocessing of removing unwanted numeric values, uppercase, lower case, frequent words, punctuation is considered. In this work, the frequency of words occurred at least fifty times in a document is identified. The experimental results of the word frequency in a document occurred twenty times, twenty five times, fifty times and hundred times was analyzed and represented visually.

References

  1. Ah-Hwee Tan, “Text Mining:The state of the art and the challenges”, International Journal of Advanced Research in Computer and Communication Engineering Vol. 1, Issue 6, August 2012
  2. Ingo Feinerer, Kurt Hornik , David Meyer “Text Mining Infrastructure in R”,? Journal of Statistical Software March 2008, Volume 25, Issue 5.
  3. Mustafa M. Shaikh, Ashwini A. Pawar, Vibha B. Lahane, Pattern Discovery Text Mining for Document Classification, International Journal of Computer Applications, Volume 117 ,No. 1,May 2015,PP:6-12.
  4. Abhishek Kaushik, and Sudhanshu Naithani,? “A Comprehensive Study of Text Mining Approach”, IJCSNS, VOL.16? No. 2, February? 2016, PP: 69 ? 76.
  5. Yu Zhang, Mengdong Chen, and Lianzhong Liu, “A review on text mining”, published in IEEE Xplore digital library, Software Engineering and Service Science (ICSESS), 2015 6th IEEE International Conference on 23-25 Sept. 2015.
  6. Abhilasha Singh Rathor? and Dr. Pankaj Garg, “Analysis on Text Mining Techniques”, IJARCSSE , Volume 6, Issue 2,February 2016, ISSN: 2277 128X, pp: 132- 137.
  7. Michele Fattoria, Giorgio Pedrazzib, and Roberta Turrab, “Text mining applied to patent mapping: a practical business case” World Patent Information, published in Elsevier, Volume 25, Issue 4, December 2003, Pages 335?342.
  8. Zhou Yong, Li Youwen and Xia Shixiong, “An Improved KNN Text Classification Algorithm Based on Clustering”, JOURNAL OF COMPUTERS, VOL. 4, NO. 3, MARCH 2009, pp: 230- 237.
  9. Songbo Tan,”Neighbor-weighted K-nearest neighbor for unbalanced text corpus”, Expert Systems with Applications,Volume 28, Issue 4, May 2005, Pages 667?671

Downloads

Published

2017-04-30

Issue

Section

Research Articles

How to Cite

[1]
K. Maheswari, P. Packia Amutha Priya , " Analysis and Implementation of Text Mining for Different Documents, International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 3, Issue 5, pp.109-113, May-June-2017.