Data Partitioning Method for Mining Frequent Itemset Using MapReduce

R. Divya Bharathi; A. S. Karthik Kannan; E. Jai Vinitha

doi:10.32628/ICASCT2525

Authors

R. Divya Bharathi Department of IT, M.Tech, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India
A. S. Karthik Kannan Department of IT, Assistant Professor, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India
E. Jai Vinitha Department of IT, M.Tech, Mepco Schlenk Engineering College, Sivakasi, Tamilnadu, India

Keywords:

Frequent Itemset Mining, Mapreduce Model, Parallel Mining, Data Partitioning.

Abstract

Existing parallel mining algorithm lacks in communication and mining overhead. To overcome this problem a data partitioning method using MapReduce model is proposed. In this model, three MapReduce tasks are implemented to improve the performance of frequent itemset mining in parallel. In second MapReduce job the mapper perform LSH based approach that integrates the item grouping and partitioning process. The reducer performs FP-Growth based on the partition data to generate all frequent patterns in the data. The main idea of data partitioning is to group relevant transactions and reduce the number of the relevant transaction. Extensive experiments using IBM Quest Market Basket Synthetic Datasets to show that data partitioning is efficient, robust and scalable on Hadoop.

References

Yaling Xun, Jifu Zhang,Xiao Qin,” FiDoop-DP:Data Partitioning in frequent itemset mining on Hadoop Clusters IEEE Transcations on Parallel and distributed system, vol28, jan.2017.
M. J. Zaki, “Parallel and distributed association mining: A survey,” IEEE Concurrency, vol. 7, no. 4, pp. 14?25, Oct. 1999.
Pramudiono and M. Kitsuregawa, “Fp-tax: Tree structure based generalized association rule mining,” in Proc. 9th ACM SIGMOD Workshop Res. Issues Data Mining Knowl. Discovery, 2004, pp. 60?63.
M.-Y. Lin, P.-Y. Lee, and S.-C. Hsueh, “Apriori-based frequent itemset mining algorithms on mapreduce,” in Proc. 6th Int. Conf. Ubiquitous Inform. Manag. Commun., 2012, pp. 76:1?76:8.
X. Lin, “Mr-apriori: Association rules algorithm based on mapreduce,” in Proc. IEEE 5th Int. Conf. Softw. Eng. Serv. Sci., 2014, pp. 141?144.
S. Hong, Z. Huaxuan, C. Shiping, and H. Chunyan, “The study of improved FP-growth algorithm in mapreduce,” in Proc. 1st Int.Workshop Cloud Comput. Inform. Security, 2013, pp. 250?253.
M. Liroz-Gistau, R. Akbarinia, D. Agrawal, E. Pacitti, and P. Valduriez, “Data partitioning for minimizing transferred data in mapreduce,” in Proc. 6th Int. Conf. Data Manag. Cloud, Grid P2P Syst., 2013, pp. 1?12.
L. Zhou, Z. Zhong, J. Chang, J. Li, J. Huang, and S. Feng, “Balanced parallel FP-growth with mapreduce,” in Proc. IEEE Youth Conf. Inform. Comput. Telecommun., 2010, pp. 243?246.
W. Lu, Y. Shen, S. Chen, and B. C. Ooi, “Efficient processing of k nearest neighbor joins using mapreduce,” Proc. VLDB Endowment, vol. 5, no. 10, pp. 1016?1027, 2012.
J. Leskovec, A. Rajaraman, and J. D. Ullman, Mining Massive Datasets. Cambridge, U.K.: Cambridge Univ. Press, 2014.
?Z.Broder,M. Charikar,A. M. Frieze,? and M. Mitzenmacher, “Min-wise? independent permutations,” J. Comput. Syst. Sci., vol. 60, no. 3, pp. 630? 659, 2000.
L. Christopher. (2001). Artool Project [J].[Online].Available???? http://www.cs.umb.edu/laur/ARtool/? accessed Oct. 19, 2012

Data Partitioning Method for Mining Frequent Itemset Using MapReduce

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite