Manuscript Number : IJSRST173465
Review on Big Data (Hadoop) processing model by implementing Data mining technique
Authors(2) :-Madhavi V. Shirbhate, Abhijit R. Itkikar Big data is a term that describes the large volume of data –sensor data, tweets, photographs, raw data, and unstructured data. But it’s not the amount of data that’s important. It’s what organizations do with the data that matters. Big data can be analyzed for insights that lead to better decisions and strategic business moves. Size of data has been exceeded Petabytes (1015 bytes) The size is not an issue but the processes are. Hadoop is a distributed computing open source framework for storing and processing huge unstructured datasets distributed across different clusters. The Business Intelligence in Hadoop retrieve the data from HDFS (Hadoop Data File System) and it locate that data in a database. The Database locate in a structured format. Due to this retrieving of data in cache duly consume the time and increase the factor of complexity. Here this paper present the data Mining algorithm to decrease the time and complexity factor for classification and clustering purpose. In this paper the identification of data present in data set is done using correlation and pattern. As the task of data mining is modelled ,a predictive or descriptive. A Predictive model makes a prediction about values of data using known results found from different data while the Descriptive model identifies patterns or relationships in data. Unlike the predictive model, a descriptive model serves as a way to explore the properties of the data examined, not to predict new properties. Predictive model data mining tasks include classification, prediction, regression and time series analysis. The Descriptive task encompasses methods such as Clustering, Summarizations, Association Rules, and Sequence analysis. So in this paper we will do classification and clustering of data on data set present in HDFS using the data mining algorithm. Like SOM (Self Organizing Maps), K-Means, Apriori.
Madhavi V. Shirbhate Big Data, Data Mining, Clustering, Classification, SOM (Self Organizing Maps), K-Means, Apriori. Publication Details
Published in : Volume 3 | Issue 4 | May-June 2017 Article Preview
ME Scholar, Department of Computer Science and Engg, Sipna Collage of Engineering and Technology, Amravati, India
Abhijit R. Itkikar
Assistant Professor, Department of Computer Science and Engg, Sipna Collage of Engineering and Technology, Amravati, India
Date of Publication : 2017-06-30
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 257-260
Manuscript Number : IJSRST173465
Publisher : Technoscience Academy
Journal URL : https://ijsrst.com/IJSRST173465
Citation Detection and Elimination |
|
|Google Scholar CrossRef