Page Comparison

...

Use Cases:

Validator Filter: All records of a transform that are invalid go into one dataset; the remainder go into another.
Writing the same output data to two separate outputs, with different formats.

...

mos.write(String datasetName, KEY key, VALUE value);

Example Usage:

Code Block

language	java

public void beforeSubmit(MapReduceContext context) throws Exception {

...


  context.addOutput("cleanCounts");

...


  context.addOutput("invalidCounts");

...


  // ...

...


public static class Counter extends Reducer<Text, IntWritable, byte[], Long> {

...


  private

...

MultipleOutputs<byte[], Long> mos;

...

@Override

...

protected void setup(Context context) {

...


    mos = new MultipleOutputs<>(context);

...



  @Override

...

public void reduce(Text key, Iterable<IntWritable> values, Context context) {

...


    // do computation and output to the desired dataset

...

if ( ... ) {

...

 mos.write("cleanCounts", key.getBytes(), val);

...


    } else {

...

 mos.write("invalidCounts", key.getBytes(), val);

...



  @Override

...

protected void cleanup(Context context) {

...


    mos.close();

...

Approach:

Take an approach similar to org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.
The Datasets to be written to must be defined in advance, in the beforeSubmit of the MapReduce job.
In the mapper/reducer, the user specifies the name of the output Dataset, and our helper class (MultipleOutputs) determines the appropriate OutputFormat and configuration for writing.

...

Versions Compared

Old Version 4

New Version 5

Key

Use Cases:

Example Usage:

Approach: