Home > Archives > IJSRST184148 IJSRST-Library

On Traffic-Aware Partition and Aggregation in Mapreduce for Big Data Applications

Authors(3) :-Shaik Inthiyaz, S. G. Nawaz, Dr. R. Ramachandra

The MapReduce programming model simplifies large-scale data processing on commodity cluster by exploiting parallel map tasks and reduce tasks. Although many efforts have been made to improve the performance of MapReduce jobs, they ignore the network traffic generated in the shuffle phase, which plays a critical role in performance enhancement. Traditionally, a hash function is used to partition intermediate data among reduce tasks, which, however, is not traffic-efficient because network topology and data size associated with each key are not taken into consideration. In this paper, we study to reduce network traffic cost for a MapReduce job by designing a novel intermediate data partition scheme. Furthermore, we jointly consider the aggregator placement problem, where each aggregator can reduce merged traffic from multiple map tasks. A decomposition-based distributed algorithm is proposed to deal with the large-scale optimization problem for big data application and an online algorithm is also designed to adjust data partition and aggregation in a dynamic manner. Finally, extensive simulation results demonstrate that our proposals can significantly reduce network traffic cost under both offline and online cases.
Shaik Inthiyaz, S. G. Nawaz, Dr. R. Ramachandra
Map Reduce, Hadoop, Bioinformatics, Cyber Security, Machine Learning, Big Data, Trafficcost
  1. J. Dean and S. Ghemawat, "Mapreduce: simplified data processing on large clusters," Communications of the ACM, vol. 51, no. 1,pp. 107-113, 2008.
  2. W. Wang, K. Zhu, L. Ying, J. Tan, and L. Zhang, "Map taskscheduling in mapreduce with data locality: Throughput andheavy-traffic optimality," in INFOCOM, 2013 Proceedings IEEE.IEEE, 2013, pp. 1609-1617.
  3. F. Chen, M. Kodialam, and T. Lakshman, "Joint scheduling of processing and shuffle phases in mapreduce systems," in INFOCOM,2012 Proceedings IEEE. IEEE, 2012, pp. 1143-1151.
  4. Y. Wang, W. Wang, C. Ma, and D. Meng, "Zput: A speedy datauploading approach for the hadoop distributed file system," inCluster Computing (CLUSTER), 2013 IEEE International Conferenceon. IEEE, 2013, pp. 1-5.
  5. T. White, Hadoop: the definitive guide: the definitive guide. " O’Reilly Media, Inc.", 2009.
  6. S. Chen and S. W. Schlosser, "Map-reduce meets wider varietiesof applications," Intel Research Pittsburgh, Tech. Rep. IRP-TR-08-05,2008.
  7. J. Rosen, N. Polyzotis, V. Borkar, Y. Bu, M. J. Carey, M. Weimer,T. Condie, and R. Ramakrishnan, "Iterative mapreduce for largescale machine learning," arXiv preprint arXiv:1303.3517, 2013.
  8. S. Venkataraman, E. Bodzsar, I. Roy, A. AuYoung, and R. S.Schreiber, "Presto: distributed machine learning and graph processing with sparse matrices," in Proceedings of the 8th ACMEuropean Conference on Computer Systems. ACM, 2013,pp.197-210.
  9. A. Matsunaga, M. Tsugawa, and J. Fortes, "Cloudblast: Combining mapreduce and virtualization on distributed resources forbioinformatics applications," in eScience, 2008. eScience’08. IEEEFourth International Conference on. IEEE, 2008, pp. 222-229.
  10. J. Wang, D. Crawl, I. Altintas, K. Tzoumas, and V. Markl, "Comparison of distributed data-parallelization patterns for big dataanalysis: A bioinformatics case study," in Proceedings of the FourthInternational Workshop on Data Intensive Computing in the Clouds(DataCloud), 2013.
  11. R. Liao, Y. Zhang, J. Guan, and S. Zhou, "Cloudnmf: A mapreduce implementation of nonnegative matrix factorization for largescale biological datasets," Genomics, proteomics & bioinformatics,vol. 12, no. 1, pp. 48-51, 2014.
Publication Details
  Published in : Volume 4 | Issue 2 | January-February 2018
  Date of Publication : 2018-02-28
License:  This work is licensed under a Creative Commons Attribution 4.0 International License.
Page(s) : 134-138
Manuscript Number : IJSRST184148
Publisher : Technoscience Academy
PRINT ISSN : 2395-6011
ONLINE ISSN : 2395-602X
Cite This Article :
Shaik Inthiyaz, S. G. Nawaz, Dr. R. Ramachandra, "On Traffic-Aware Partition and Aggregation in Mapreduce for Big Data Applications", International Journal of Scientific Research in Science and Technology(IJSRST), Print ISSN : 2395-6011, Online ISSN : 2395-602X, Volume 4, Issue 2, pp.134-138, January-February-2018
URL : http://ijsrst.com/IJSRST184148