MapReduce and Apache Tez Data Compression Techniques

Chandrabhan S. Jadhao; Prof. Harish K. Barapatre

doi:10.32628/IJSRST1845224

Authors

Chandrabhan S. Jadhao Department of Computer Engineering SESâ€™s Yadavrao Tasgaonkar Institute Of Engineering And Technology, Karjat, Maharashtra, India
Prof. Harish K. Barapatre Department of Computer Engineering SESâ€™s Yadavrao Tasgaonkar Institute Of Engineering And Technology, Karjat, Maharashtra, India

Keywords:

Compression, Hadoop, HDFS, MapReduce, Tez.

Abstract

In Hadoop system, there are many challenges in dealing with enormous data sets. Regardless of whether you store your data,the fundamental challenge is that large volumes can usually cause network and INPUT/OUTPUT bottlenecks. One of the best data compression technique is using MapReduce, many instances of “map” steps process individual blocks of an inputs to produce one or more outputs; these outputs are passed to “reduce” steps where they are combined to produce a single result. MapReduce framework is a main engine of Hadoop cluster and widely used. It uses a batch oriented processing. Apache also developed an alternative engine called “Tez”. Which supports an interactive query and does not write temporary data into HDFS. This paper delves into MapReduce and Apache Tez data compression techniques that efficiently compresses and decompresses a large amount of data.

MapReduce and Apache Tez Data Compression Techniques

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite