Review on Big Data (Hadoop) processing model by implementing Data mining technique

Madhavi V. Shirbhate; Abhijit R. Itkikar

doi:10.32628/IJSRST173465

Authors

Madhavi V. Shirbhate ME Scholar, Department of Computer Science and Engg, Sipna Collage of Engineering and Technology, Amravati, India
Abhijit R. Itkikar Assistant Professor, Department of Computer Science and Engg, Sipna Collage of Engineering and Technology, Amravati, India

Keywords:

Big Data, Data Mining, Clustering, Classification, SOM (Self Organizing Maps), K-Means, Apriori.

Abstract

Big data is a term that describes the large volume of data â€“sensor data, tweets, photographs, raw data, and unstructured data. But itâ€™s not the amount of data thatâ€™s important. Itâ€™s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves. Size of data has been exceeded Petabytes (1015 bytes) The size is not an issue but the processes are. Hadoop is a distributed computing open source framework for storing and processing huge unstructured datasets distributed across different clusters. The Business Intelligence in Hadoop retrieve the data from HDFS (Hadoop Data File System) and it locate that data in a database. The Database locate in a structured format. Due to this retrieving of data in cache duly consume the time and increase the factor of complexity. Here this paper present the data Mining algorithm to decrease the time and complexity factor for classification and clustering purpose. In this paper the identification of data present in data set is done using correlation and pattern. As the task of data mining is modelled ,a predictive or descriptive. A Predictive model makes a prediction about values of data using known results found from different data while the Descriptive model identifies patterns or relationships in data. Unlike the predictive model, a descriptive model serves as a way to explore the properties of the data examined, not to predict new properties. Predictive model data mining tasks include classification, prediction, regression and time series analysis. The Descriptive task encompasses methods such as Clustering, Summarizations, Association Rules, and Sequence analysis. So in this paper we will do classification and clustering of data on data set present in HDFS using the data mining algorithm. Like SOM (Self Organizing Maps), K-Means, Apriori.

References

Zhang Y., Chen M., Mao S., Hu L., Leung V.CAP: crowd activity prediction based on big data analysisIEEE Network2014284525710.1109/mnet.2014.6863132 Google Scholar CrossRef
Chen M., Mao S., Zhang Y., Leung V.Big Data: Related Technologies, Challenges and Future Prospects2014SpringerSpringerBriefs in Computer Science Google Scholar CrossRef
Wan J., Zhang D., Sun Y., Lin K., Zou C., Cai H.VCMIA: a novel architecture for integrating vehicular cyber-physical systems and mobile cloud computingMobile Networks and Applications201419215316010.1007/s11036-014-0499-62-s2.0-84898828128 Google Scholar CrossRef
Chen F., Rong X.-H., Deng P., Ma S.-L.A survey of device collaboration technology and system softwareActa Electronica Sinica20113924404472-s2.0-79955052781 Google Scholar
Zhou L., Chen M., Zheng B., Cui J.Green multimedia communications over Internet of ThingsProceedings of the IEEE International Conference on Communications (ICC ′12)June 2012Ottawa, Canada1948195210.1109/icc.2012.63639092-s2.0-84871967365 CrossRef
Deng P., Zhang J. W., Rong X. H., Chen F.A model of large-scale Device Collaboration system based on PI-Calculus for green communicationTelecommunication Systems20135221313132610.1007/s11235-011-9643-92-s2.0-84879603230 Google Scholar CrossRef
Zhang J., Deng P., Wan J., Yan B., Rong X., Chen F.A novel multimedia device ability matching technique for ubiquitous computing environmentsEURASIP Journal on Wireless Communications and Networking201320131, article 1811210.1186/1687-1499-2013-1812-s2.0-84894120909 Google Scholar CrossRef
Han, Jiawei. "Data mining techniques." In ACM SIGMOD Record, vol. 25, no. 2, p. 545. ACM, 1996
Sidhu, Nimrat Kaur, and Rajneet Kaur. "Clustering In Data Mining.
Sakthi, M. Thanamani. A, “ An Enhanced K Means Clustering using improved Hopfield artificial neural network and genetic algorithm:, international jouranal of recent technology and engineering (IJRTE) ISSN: 2277-3878, Vol-2, 2013
Shafeeg a., Hareesha K., “Dynamic clustering of data with modified K-means algorithams” International conference on Information and Computer Networks, vol. 27, 2012
Libao ZHANG, Faming LIU, Pingping GUO, Cong LIU,” application of K-means clustereing algoritham fpr classification of NBA guards”, international jouranal of science and engineering application volumn 5 issue1, 2016, ISSN-2319-7560(Online).

Review on Big Data (Hadoop) processing model by implementing Data mining technique

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite