Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Use Cases:

  • Validator Filter: All records of a transform that are invalid go into one dataset; the remainder go into another.
  • Writing the same output data to two separate outputs, with different formats.

...

mos.write(String datasetName, KEY key, VALUE value);

 

Example Usage:

public void beforeSubmit(MapReduceContext context) throws Exception {
  context.addOutput("cleanCounts");
  context.addOutput("invalidCounts");
  // ...

}

public static class Counter extends Reducer<Text, IntWritable, byte[], Long> {
private MultipleOutputs<byte[], Long> mos;

 @Override
 protected void setup(Context context) {
mos = new MultipleOutputs<>(context);
 }

@Override
 public void reduce(Text key, Iterable<IntWritable> values, Context context) {
 
    // do computation and output to the desired dataset
  if ( ... ) { 
   mos.write("cleanCounts", key.getBytes(), val);
     } else { 
   mos.write("invalidCounts", key.getBytes(), val);
   }
  }

@Override
 protected void cleanup(Context context) {
mos.close();
 }
}


Approach:

Take an approach similar to org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.
The Datasets to be written to must be defined in advance, in the beforeSubmit of the MapReduce job.
In the mapper/reducer, the user specifies the name of the output Dataset, and our helper class (MultipleOutputs) determines the appropriate OutputFormat and configuration for writing.

...