Page Comparison

Use Cases:

Validator Filter: All records of a transform that are invalid go into one dataset; the remainder go into another.
Writing the same output data to two separate outputs, with different formats.

...

mos.write(String datasetName, KEY key, VALUE value);

Example Usage:

public void beforeSubmit(MapReduceContext context) throws Exception {

  context.addOutput("cleanCounts");

  context.addOutput("invalidCounts");

  // ...

}

public static class Counter extends Reducer<Text, IntWritable, byte[], Long> {
  private MultipleOutputs<byte[], Long> mos;

  @Override
  protected void setup(Context context) {
    mos = new MultipleOutputs<>(context);
  }

  @Override
  public void reduce(Text key, Iterable<IntWritable> values, Context context) {

    // do computation and output to the desired dataset

  if ( ... ) {

   mos.write("cleanCounts", key.getBytes(), val);

     } else {

   mos.write("invalidCounts", key.getBytes(), val);

  }

  @Override
  protected void cleanup(Context context) {
    mos.close();
  }
}

Approach:

Take an approach similar to org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.
The Datasets to be written to must be defined in advance, in the beforeSubmit of the MapReduce job.
In the mapper/reducer, the user specifies the name of the output Dataset, and our helper class (MultipleOutputs) determines the appropriate OutputFormat and configuration for writing.

...

Versions Compared

Old Version 3

New Version 4

Key

Use Cases:

Example Usage:

Approach: