Data Partitioning Method for Mining Frequent Itemset Using MapReduce
Keywords:
Frequent Itemset Mining, Mapreduce Model, Parallel Mining, Data Partitioning.Abstract
Existing parallel mining algorithm lacks in communication and mining overhead. To overcome this problem a data partitioning method using MapReduce model is proposed. In this model, three MapReduce tasks are implemented to improve the performance of frequent itemset mining in parallel. In second MapReduce job the mapper perform LSH based approach that integrates the item grouping and partitioning process. The reducer performs FP-Growth based on the partition data to generate all frequent patterns in the data. The main idea of data partitioning is to group relevant transactions and reduce the number of the relevant transaction. Extensive experiments using IBM Quest Market Basket Synthetic Datasets to show that data partitioning is efficient, robust and scalable on Hadoop.
References
- Yaling Xun, Jifu Zhang,Xiao Qin,” FiDoop-DP:Data Partitioning in frequent itemset mining on Hadoop Clusters IEEE Transcations on Parallel and distributed system, vol28, jan.2017.
- M. J. Zaki, “Parallel and distributed association mining: A survey,” IEEE Concurrency, vol. 7, no. 4, pp. 14?25, Oct. 1999.
- Pramudiono and M. Kitsuregawa, “Fp-tax: Tree structure based generalized association rule mining,” in Proc. 9th ACM SIGMOD Workshop Res. Issues Data Mining Knowl. Discovery, 2004, pp. 60?63.
- M.-Y. Lin, P.-Y. Lee, and S.-C. Hsueh, “Apriori-based frequent itemset mining algorithms on mapreduce,” in Proc. 6th Int. Conf. Ubiquitous Inform. Manag. Commun., 2012, pp. 76:1?76:8.
- X. Lin, “Mr-apriori: Association rules algorithm based on mapreduce,” in Proc. IEEE 5th Int. Conf. Softw. Eng. Serv. Sci., 2014, pp. 141?144.
- S. Hong, Z. Huaxuan, C. Shiping, and H. Chunyan, “The study of improved FP-growth algorithm in mapreduce,” in Proc. 1st Int.Workshop Cloud Comput. Inform. Security, 2013, pp. 250?253.
- M. Liroz-Gistau, R. Akbarinia, D. Agrawal, E. Pacitti, and P. Valduriez, “Data partitioning for minimizing transferred data in mapreduce,” in Proc. 6th Int. Conf. Data Manag. Cloud, Grid P2P Syst., 2013, pp. 1?12.
- L. Zhou, Z. Zhong, J. Chang, J. Li, J. Huang, and S. Feng, “Balanced parallel FP-growth with mapreduce,” in Proc. IEEE Youth Conf. Inform. Comput. Telecommun., 2010, pp. 243?246.
- W. Lu, Y. Shen, S. Chen, and B. C. Ooi, “Efficient processing of k nearest neighbor joins using mapreduce,” Proc. VLDB Endowment, vol. 5, no. 10, pp. 1016?1027, 2012.
- J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining Massive Datasets. Cambridge, U.K.: Cambridge Univ. Press, 2014.
- ?Z.Broder,M. Charikar,A. M. Frieze,? and M. Mitzenmacher, “Min-wise? independent permutations,” J. Comput. Syst. Sci., vol. 60, no. 3, pp. 630? 659, 2000.
- L. Christopher. (2001). Artool Project [J].[Online].Available???? http://www.cs.umb.edu/laur/ARtool/? accessed Oct. 19, 2012
Downloads
Published
Issue
Section
License
Copyright (c) IJSRST

This work is licensed under a Creative Commons Attribution 4.0 International License.