Use Cases:
- Validator Filter: All records of a transform that are invalid go into one dataset; the remainder go into another.
- Writing the same output data to two separate outputs, with different formats.
...
mos.write(String datasetName, KEY key, VALUE value);
Example Usage:
public void beforeSubmit(MapReduceContext context) throws Exception {
context.addOutput("cleanCounts");
context.addOutput("invalidCounts");
// ...
}
public static class Counter extends Reducer<Text, IntWritable, byte[], Long> {
private MultipleOutputs<byte[], Long> mos;
@Override
protected void setup(Context context) {
mos = new MultipleOutputs<>(context);
}
@Override
public void reduce(Text key, Iterable<IntWritable> values, Context context) {
// do computation and output to the desired dataset
if ( ... ) {
mos.write("cleanCounts", key.getBytes(), val);
} else {
mos.write("invalidCounts", key.getBytes(), val);
}
}
@Override
protected void cleanup(Context context) {
mos.close();
}
}
Approach:
Take an approach similar to org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.
The Datasets to be written to must be defined in advance, in the beforeSubmit of the MapReduce job.
In the mapper/reducer, the user specifies the name of the output Dataset, and our helper class (MultipleOutputs) determines the appropriate OutputFormat and configuration for writing.
...