Definition of the Week: MapReduce

Redpoint Global | January 19, 2016

MapReduce programs are designed to compute large volumes of data in a parallel fashion. They can divide the workload across hundreds or thousands of servers in a Hadoop cluster to allow massive scalability.

The term MapReduce is a combination of words to describe two separate and distinct tasks: The first is Map, which filters and sorts data into key/value pairs. Reduce performs a summary operation on this output. For example, Map sorts a list of names into queues and Reduce provides total counts.

MapReduce has been eclipsed by more complete programming models. With Hadoop 2.0, its functions have been taken over by YARN (Yet Another Resource Negotiator).

Redpoint Data Management for Hadoop was one of the first applications certified by Hortonworks as being able to process Big Data natively within a Hadoop 2.0 environment using the YARN architecture, making MapReduce unnecessary.

Share This
Redpoint Global
Redpoint Global