MapReduce and Apache Tez Data Compression Techniques
Keywords:
Compression, Hadoop, HDFS, MapReduce, Tez.Abstract
In Hadoop system, there are many challenges in dealing with enormous data sets. Regardless of whether you store your data,the fundamental challenge is that large volumes can usually cause network and INPUT/OUTPUT bottlenecks. One of the best data compression technique is using MapReduce, many instances of “map” steps process individual blocks of an inputs to produce one or more outputs; these outputs are passed to “reduce” steps where they are combined to produce a single result. MapReduce framework is a main engine of Hadoop cluster and widely used. It uses a batch oriented processing. Apache also developed an alternative engine called “Tez”. Which supports an interactive query and does not write temporary data into HDFS. This paper delves into MapReduce and Apache Tez data compression techniques that efficiently compresses and decompresses a large amount of data.
References
Downloads
Published
Issue
Section
License
Copyright (c) IJSRST

This work is licensed under a Creative Commons Attribution 4.0 International License.