Data Partitioning Method for Mining Frequent Itemset Using MapReduce

Authors

  • R. Divya Bharathi  Department of IT, M.Tech, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India
  • A. S. Karthik Kannan  Department of IT, Assistant Professor, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India
  • E. Jai Vinitha  Department of IT, M.Tech, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India

Keywords:

Frequent Itemset Mining, Mapreduce Model, Parallel Mining, Data Partitioning.

Abstract

Existing parallel mining algorithm lacks in communication and mining overhead. To overcome this problem a data partitioning method using MapReduce model is proposed. In this model, three MapReduce tasks are implemented to improve the performance of frequent itemset mining in parallel. In second MapReduce job the mapper perform LSH based approach that integrates the item grouping and partitioning process. The reducer performs FP-Growth based on the partition data to generate all frequent patterns in the data. The main idea of data partitioning is to group relevant transactions and reduce the number of the relevant transaction. Extensive experiments using IBM Quest Market Basket Synthetic Datasets to show that data partitioning is efficient, robust and scalable on Hadoop.

References

  1. Yaling Xun, Jifu Zhang,Xiao Qin,” FiDoop-DP:Data Partitioning in frequent itemset mining on Hadoop Clusters IEEE Transcations on Parallel and distributed system, vol28, jan.2017.
  2. M. J. Zaki, “Parallel and distributed association mining: A survey,” IEEE Concurrency, vol. 7, no. 4, pp. 14?25, Oct. 1999.
  3. Pramudiono and M. Kitsuregawa, “Fp-tax: Tree structure based generalized association rule mining,” in Proc. 9th ACM SIGMOD Workshop Res. Issues Data Mining Knowl. Discovery, 2004, pp. 60?63.
  4. M.-Y. Lin, P.-Y. Lee, and S.-C. Hsueh, “Apriori-based frequent itemset mining algorithms on mapreduce,” in Proc. 6th Int. Conf. Ubiquitous Inform. Manag. Commun., 2012, pp. 76:1?76:8.
  5. X. Lin, “Mr-apriori: Association rules algorithm based on mapreduce,” in Proc. IEEE 5th Int. Conf. Softw. Eng. Serv. Sci., 2014, pp. 141?144.
  6. S. Hong, Z. Huaxuan, C. Shiping, and H. Chunyan, “The study of improved FP-growth algorithm in mapreduce,” in Proc. 1st Int.Workshop Cloud Comput. Inform. Security, 2013, pp. 250?253.
  7. M. Liroz-Gistau, R. Akbarinia, D. Agrawal, E. Pacitti, and P. Valduriez, “Data partitioning for minimizing transferred data in mapreduce,” in Proc. 6th Int. Conf. Data Manag. Cloud, Grid P2P Syst., 2013, pp. 1?12.
  8. L. Zhou, Z. Zhong, J. Chang, J. Li, J. Huang, and S. Feng, “Balanced parallel FP-growth with mapreduce,” in Proc. IEEE Youth Conf. Inform. Comput. Telecommun., 2010, pp. 243?246.
  9. W. Lu, Y. Shen, S. Chen, and B. C. Ooi, “Efficient processing of k nearest neighbor joins using mapreduce,” Proc. VLDB Endowment, vol. 5, no. 10, pp. 1016?1027, 2012.
  10. J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining Massive Datasets. Cambridge, U.K.: Cambridge Univ. Press, 2014.
  11. ?Z.Broder,M. Charikar,A. M. Frieze,? and M. Mitzenmacher, “Min-wise? independent permutations,” J. Comput. Syst. Sci., vol. 60, no. 3, pp. 630? 659, 2000.
  12. L. Christopher. (2001). Artool Project [J].[Online].Available???? http://www.cs.umb.edu/laur/ARtool/? accessed Oct. 19, 2012

Downloads

Published

2017-04-30

Issue

Section

Research Articles

How to Cite

[1]
R. Divya Bharathi, A. S. Karthik Kannan, E. Jai Vinitha, " Data Partitioning Method for Mining Frequent Itemset Using MapReduce, International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 3, Issue 5, pp.146-153, May-June-2017.