Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

 

Use Cases:

  • Validator Filter: All records of a transform that are invalid go into one dataset; the remainder go into another.
  • Writing the same output data to two separate outputs, with different formats.

...

mos.write(String datasetName, KEY key, VALUE value);

 

Example Usage:

Code Block
languagejava
public void beforeSubmit(MapReduceContext context) throws Exception {

...


  context.addOutput("cleanCounts");

...


  context.addOutput("invalidCounts");

...


  // ...

...


}

...


public static class Counter extends Reducer<Text, IntWritable, byte[], Long> {

...


  private 

...

MultipleOutputs<byte[], Long> mos;

...



  

...

@Override

...


  

...

protected void setup(Context context) {

...


    mos = new MultipleOutputs<>(context);

...


  

...

}

...



  @Override

...


  

...

public void reduce(Text key, Iterable<IntWritable> values, Context context) {

...


    // do computation and output to the desired dataset

...


    

...

if ( ... ) {

...


     

...

 mos.write("cleanCounts", key.getBytes(), val);

...


    } else {

...


     

...

 mos.write("invalidCounts", key.getBytes(), val);

...


   

...

 }

...


  }

...



  @Override

...


  

...

protected void cleanup(Context context) {

...


    mos.close();

...


  

...

}

...


}


Approach:

Take an approach similar to org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.
The Datasets to be written to must be defined in advance, in the beforeSubmit of the MapReduce job.
In the mapper/reducer, the user specifies the name of the output Dataset, and our helper class (MultipleOutputs) determines the appropriate OutputFormat and configuration for writing.

...