Big Data Clustering Using Heuristic Data Intensive Computing and Self Organizing Maps

Authors

  • K Nagamani  Master of Science (CS), Department of Computer Science, RIIMS College, SV University, Tirupati, Andhra Pradesh, India
  • K Sunitha  Associate Professor, Department of Computer Science, RIIMS College, SV University, Tirupati, Andhra Pradesh, India

Keywords:

HDIC, Self organizing maps, big data, clustering and IBM database generator.

Abstract

Traditional data clustering algorithms are having pitfalls while discovering efficient clusters. As the data base size increases dynamically and the dramatic changes in the use of data, will shows adequate results on clustering performance. Transforming the massive amounts of data into knowledge will leverage the organization performance to the maximum. Scientific and business organization would benefit from utilizing big data. However, there are many challenges in dealing with big data such as storage, transfer, management and manipulation of big data. Many techniques are required to explore the hidden and transitive patterns inside the big data which have limitations in terms of hardware and software implementation. Through this, a unified framework is presented for big data clustering using a Heuristic data intensive computing (HDIC) and Self-Organizing Maps (SOM). It is implemented on an N-node HDIC clusters, driven by a wide range of data sets created using IBM synthetic data generator and real time data sets taken from UCI. This is significantly implemented to improve the performance of the big data clustering on the existing approaches.

References

  1. Agneeswaran, V. S. (2012). Big-data-theoretical, engineering and analytics perspective. In S.Srinivasa & V. Bhatnagar (Eds.), Big Data Analytics SE-2Berlin, Germany: Springer-Verlag.
  2. Brzezniak, M., Meyer, N., Flouris, M., Lachaiz, R. & Bilas, A. (2008). Analysis of grid storage element architectures: high-end fiber-channel vs. emerging cluster-based networked storage. In M. Brzezniak, N. Meyer, M. Flouris, R. Lachaiz & A. Bilas (Eds.), Grid middleware and services SE , US: Springer.
  3. Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S. & Zhou, X. (2013). Big data challenge: a data management perspective. Frontiers of Computer Science.
  4. Das, S., Abraham, A. & Konar, A. (2009). Metaheuristic pattern clustering-an overview. Metaheuristic Clustering, Berlin, Germany: Springer-Verlag.
  5. Deneubourg, J.L., Goss, S., Franks, N., Sendova-Franks, A., Detrain, C. & Chretien, L. (1990). The dynamics of collective sorting robot like ants and ant like robots. Proceedings of the 1stInternational Conference on Simulation of Adaptive Behaviour: From Animals to Animats.
  6. Hall, L.O. (2013). Exploring big data with scalable soft clustering. In R. Kruse, M. R. Berthold, C.Moewes, M.Á. Gil, P. Grzegorzewski & O. Hryniewicz (Eds.), Synergies of Soft Computing andStatistics for Intelligent Data Analysis, Berlin, Germany: Springer-Verlag.
  7. Kim, B. (2012). A classifier for big data. In G. Lee, D. Howard, D. slezak & Y. Hong (Eds.), Convergence and Hybrid Information Technology, Berlin: Germany: Springer-Verlag.
  8. Madheswari, A.N. & Banu, R.S.D.W. (2011). Communication aware co-scheduling for parallel jobscheduling in cluster computing. In A. Abraham, J. Lloret Mauri, J. Buford, J. Suzuki & S.Thampi (Eds.), Advances in Computing and Communications, Berlin, Germany:Springer.
  9. Qin, X. (2012). Making use of the big data: next generation of algorithm trading. In J. Lei, F. Wang, H.Deng & D. Miao (Eds.), Artificial Intelligence and Computational Intelligence, Berlin,Germany: Springer-Verlag.
  10. Strehl, A. & Ghosh, J. (2002). Cluster ensembles-a knowledge reuse framework for combiningmultiple partitions. Journal of Machine Learning Research.
  11. A. Fahad, N. Alshatri and Z. Tari, "A Survey of Clustering Algorithms for Big Data: Taxonomy", IEEE Transactions on Emerging Topics in Computing 2014.
  12. Btissam Zerhari, Ayoub Ait Lahcen and Salma Mouline, "Big Data Clustering: Algorithms and Challenges", International Conference on Big Data, Cloud and Applications BDCA'15 , At Tetuan, Morocco , conference paper may 2015.
  13. Apurva Juyal Dr. O. P. Gupta,"A Review on Clustering Techniques in Data Mining",International Journal of Advanced Research in Computer Science and Software Engineering, Volume 4, Issue 7, July 2014.
  14. Keshavanse, Meena Sharma,"Clustering methods for Big data analysis",International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Volume 4 Issue 3, March 2015.
  15. S.M. Junaid, K.V. Bhosle," Overview of Clustering Techniques", International Journal of Advanced Research in Computer Science and Software Engineering,Volume 4, Issue 11, November 2014.
  16. DongkuanXu and YingjieTian, "A Comprehensive Survey of Clustering Algorithms", Annals of Data Science, Springer-Verlag Berlin Heidelberg August 20.
  17. C. YADAV, S. WANG, et M. KUMAR, "Algorithm and approaches to handle large Data-A Survey," International Journal of computer science and network, vol 2, issue 3, 2013.
  18. Manish Kumar Kakhani, Sweeti Kakhani and S.R. Biradar, "Research Issues in Big Data Analytics", International Journal of Application or Innovation in Engineering & Management (IJAIEM), Volume 2, Issue 8, August 2013.
  19. Justin Samuel, Koundinya RVP, KothaSashidhar and C.R. Bharathi, A Survey on Big Data and its Research Challenges, ARPN Journal of Engineering and Applied Sciences, Vol. 10, No. 8, May 2015.

Downloads

Published

2018-04-30

Issue

Section

Research Articles

How to Cite

[1]
K Nagamani, K Sunitha, " Big Data Clustering Using Heuristic Data Intensive Computing and Self Organizing Maps, International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 4, Issue 5, pp.1551-1557, March-April-2018.