Review on Big Data (Hadoop) processing model by implementing Data mining technique

Authors(2) :-Madhavi V. Shirbhate, Abhijit R. Itkikar

Big data is a term that describes the large volume of data –sensor data, tweets, photographs, raw data, and unstructured data. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves. Size of data has been exceeded Petabytes (1015 bytes) The size is not an issue but the processes are. Hadoop is a distributed computing open source framework for storing and processing huge unstructured datasets distributed across different clusters. The Business Intelligence in Hadoop retrieve the data from HDFS (Hadoop Data File System) and it locate that data in a database. The Database locate in a structured format. Due to this retrieving of data in cache duly consume the time and increase the factor of complexity. Here this paper present the data Mining algorithm to decrease the time and complexity factor for classification and clustering purpose. In this paper the identification of data present in data set is done using correlation and pattern. As the task of data mining is modelled ,a predictive or descriptive. A Predictive model makes a prediction about values of data using known results found from different data while the Descriptive model identifies patterns or relationships in data. Unlike the predictive model, a descriptive model serves as a way to explore the properties of the data examined, not to predict new properties. Predictive model data mining tasks include classification, prediction, regression and time series analysis. The Descriptive task encompasses methods such as Clustering, Summarizations, Association Rules, and Sequence analysis. So in this paper we will do classification and clustering of data on data set present in HDFS using the data mining algorithm. Like SOM (Self Organizing Maps), K-Means, Apriori.

Authors and Affiliations

Madhavi V. Shirbhate
ME Scholar, Department of Computer Science and Engg, Sipna Collage of Engineering and Technology, Amravati, India
Abhijit R. Itkikar
Assistant Professor, Department of Computer Science and Engg, Sipna Collage of Engineering and Technology, Amravati, India

Big Data, Data Mining, Clustering, Classification, SOM (Self Organizing Maps), K-Means, Apriori.

  1. Zhang Y., Chen M., Mao S., Hu L., Leung V.CAP: crowd activity prediction based on big data analysisIEEE Network2014284525710.1109/mnet.2014.6863132 Google Scholar CrossRef
  2. Chen M., Mao S., Zhang Y., Leung V.Big Data: Related Technologies, Challenges and Future Prospects2014SpringerSpringerBriefs in Computer Science Google Scholar CrossRef
  3. Wan J., Zhang D., Sun Y., Lin K., Zou C., Cai H.VCMIA: a novel architecture for integrating vehicular cyber-physical systems and mobile cloud computingMobile Networks and Applications201419215316010.1007/s11036-014-0499-62-s2.0-84898828128 Google Scholar CrossRef
  4. Chen F., Rong X.-H., Deng P., Ma S.-L.A survey of device collaboration technology and system softwareActa Electronica Sinica20113924404472-s2.0-79955052781 Google Scholar
  5. Zhou L., Chen M., Zheng B., Cui J.Green multimedia communications over Internet of ThingsProceedings of the IEEE International Conference on Communications (ICC ′12)June 2012Ottawa, Canada1948195210.1109/icc.2012.63639092-s2.0-84871967365 CrossRef
  6. Deng P., Zhang J. W., Rong X. H., Chen F.A model of large-scale Device Collaboration system based on PI-Calculus for green communicationTelecommunication Systems20135221313132610.1007/s11235-011-9643-92-s2.0-84879603230 Google Scholar CrossRef
  7. Zhang J., Deng P., Wan J., Yan B., Rong X., Chen F.A novel multimedia device ability matching technique for ubiquitous computing environmentsEURASIP Journal on Wireless Communications and Networking201320131, article 1811210.1186/1687-1499-2013-1812-s2.0-84894120909 Google Scholar CrossRef
  8. Han, Jiawei. "Data  mining techniques." In ACM SIGMOD Record, vol. 25, no. 2, p. 545. ACM, 1996
  9. Sidhu, Nimrat Kaur, and Rajneet Kaur. "Clustering In Data Mining.
  10. Sakthi, M. Thanamani. A, “ An Enhanced K Means Clustering using improved Hopfield artificial neural network and genetic algorithm:, international jouranal of recent technology and engineering (IJRTE) ISSN: 2277-3878, Vol-2, 2013
  11. Shafeeg a., Hareesha K., “Dynamic clustering of data with modified K-means algorithams” International conference on Information and Computer Networks, vol. 27, 2012
  12. Libao ZHANG, Faming LIU, Pingping GUO, Cong LIU,” application of  K-means clustereing algoritham fpr classification of NBA guards”, international jouranal of science and engineering application volumn 5 issue1, 2016, ISSN-2319-7560(Online).

Publication Details

Published in : Volume 3 | Issue 4 | May-June 2017
Date of Publication : 2017-06-30
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 257-260
Manuscript Number : IJSRST173465
Publisher : Technoscience Academy

Print ISSN : 2395-6011, Online ISSN : 2395-602X

Cite This Article :

Madhavi V. Shirbhate, Abhijit R. Itkikar, " Review on Big Data (Hadoop) processing model by implementing Data mining technique", International Journal of Scientific Research in Science and Technology(IJSRST), Print ISSN : 2395-6011, Online ISSN : 2395-602X, Volume 3, Issue 4, pp.257-260, May-June-2017.
Journal URL :

Article Preview