Analysis and Implementation of Text Mining for Different Documents

Authors(2) :-K. Maheswari, P. Packia Amutha Priya

The process of making structured data from unstructured and semi structured text is called text mining. Text mining is defined as bag of words. The environment is set up with various documents in a database. The preprocessing of removing unwanted numeric values, uppercase, lower case, frequent words, punctuation is considered. In this work, the frequency of words occurred at least fifty times in a document is identified. The experimental results of the word frequency in a document occurred twenty times, twenty five times, fifty times and hundred times was analyzed and represented visually.

Authors and Affiliations

K. Maheswari
Department of Computer Applications, Kalasalingam University, Krishnankoil, Tamil Nadu, India
P. Packia Amutha Priya
Department of Computer Applications, Kalasalingam University, Krishnankoil, Tamil Nadu, India

Text Mining, Data Mining, frequency of words and text file

  1. Ah-Hwee Tan, “Text Mining:The state of the art and the challenges”, International Journal of Advanced Research in Computer and Communication Engineering Vol. 1, Issue 6, August 2012
  2. Ingo Feinerer, Kurt Hornik , David Meyer “Text Mining Infrastructure in R”,? Journal of Statistical Software March 2008, Volume 25, Issue 5.
  3. Mustafa M. Shaikh, Ashwini A. Pawar, Vibha B. Lahane, Pattern Discovery Text Mining for Document Classification, International Journal of Computer Applications, Volume 117 ,No. 1,May 2015,PP:6-12.
  4. Abhishek Kaushik, and Sudhanshu Naithani,? “A Comprehensive Study of Text Mining Approach”, IJCSNS, VOL.16? No. 2, February? 2016, PP: 69 ? 76.
  5. Yu Zhang, Mengdong Chen, and Lianzhong Liu, “A review on text mining”, published in IEEE Xplore digital library, Software Engineering and Service Science (ICSESS), 2015 6th IEEE International Conference on 23-25 Sept. 2015.
  6. Abhilasha Singh Rathor? and Dr. Pankaj Garg, “Analysis on Text Mining Techniques”, IJARCSSE , Volume 6, Issue 2,February 2016, ISSN: 2277 128X, pp: 132- 137.
  7. Michele Fattoria, Giorgio Pedrazzib, and Roberta Turrab, “Text mining applied to patent mapping: a practical business case” World Patent Information, published in Elsevier, Volume 25, Issue 4, December 2003, Pages 335?342.
  8. Zhou Yong, Li Youwen and Xia Shixiong, “An Improved KNN Text Classification Algorithm Based on Clustering”, JOURNAL OF COMPUTERS, VOL. 4, NO. 3, MARCH 2009, pp: 230- 237.
  9. Songbo Tan,”Neighbor-weighted K-nearest neighbor for unbalanced text corpus”, Expert Systems with Applications,Volume 28, Issue 4, May 2005, Pages 667?671

Publication Details

Published in : Volume 3 | Issue 5 | May-June 2017
Date of Publication : 2017-04-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 109-113
Manuscript Number : ICASCT2518
Publisher : Technoscience Academy

Print ISSN : 2395-6011, Online ISSN : 2395-602X

Cite This Article :

K. Maheswari, P. Packia Amutha Priya , " Analysis and Implementation of Text Mining for Different Documents", International Journal of Scientific Research in Science and Technology(IJSRST), Print ISSN : 2395-6011, Online ISSN : 2395-602X, Volume 3, Issue 5, pp.109-113, May-June-2017.
Journal URL :

Article Preview