MapReduce and Apache Tez Data Compression Techniques

Authors

  • Chandrabhan S. Jadhao  Department of Computer Engineering SES’s Yadavrao Tasgaonkar Institute Of Engineering And Technology, Karjat, Maharashtra, India
  • Prof. Harish K. Barapatre  Department of Computer Engineering SES’s Yadavrao Tasgaonkar Institute Of Engineering And Technology, Karjat, Maharashtra, India

Keywords:

Compression, Hadoop, HDFS, MapReduce, Tez.

Abstract

In Hadoop system, there are many challenges in dealing with enormous data sets. Regardless of whether you store your data,the fundamental challenge is that large volumes can usually cause network and INPUT/OUTPUT bottlenecks. One of the best data compression technique is using MapReduce, many instances of “map” steps process individual blocks of an inputs to produce one or more outputs; these outputs are passed to “reduce” steps where they are combined to produce a single result. MapReduce framework is a main engine of Hadoop cluster and widely used. It uses a batch oriented processing. Apache also developed an alternative engine called “Tez”. Which supports an interactive query and does not write temporary data into HDFS. This paper delves into MapReduce and Apache Tez data compression techniques that efficiently compresses and decompresses a large amount of data.

References

Downloads

Published

2018-04-30

Issue

Section

Research Articles

How to Cite

[1]
Chandrabhan S. Jadhao, Prof. Harish K. Barapatre, " MapReduce and Apache Tez Data Compression Techniques, International Journal of Scientific Research in Science and Technology(IJSRST), Online ISSN : 2395-602X, Print ISSN : 2395-6011, Volume 4, Issue 5, pp., March-April-2018.