...
Use Cases:
- Validator Filter: All records of a transform that are invalid go into one dataset; the remainder go into another.
- Writing the same output data to two separate outputs, with different formats.
...
mos.write(String datasetName, KEY key, VALUE value);
Example Usage:
Code Block | ||
---|---|---|
| ||
public void beforeSubmit(MapReduceContext context) throws Exception { |
...
context.addOutput("cleanCounts"); |
...
context.addOutput("invalidCounts"); |
...
// ... |
...
} |
...
public static class Counter extends Reducer<Text, IntWritable, byte[], Long> { |
...
private |
...
MultipleOutputs<byte[], Long> mos; |
...
...
@Override |
...
...
protected void setup(Context context) { |
...
mos = new MultipleOutputs<>(context); |
...
...
} |
...
@Override |
...
...
public void reduce(Text key, Iterable<IntWritable> values, Context context) { |
...
// do computation and output to the desired dataset |
...
...
if ( ... ) { |
...
...
mos.write("cleanCounts", key.getBytes(), val); |
...
} else { |
...
...
mos.write("invalidCounts", key.getBytes(), val); |
...
...
} |
...
} |
...
@Override |
...
...
protected void cleanup(Context context) { |
...
mos.close(); |
...
...
} |
...
} |
Approach:
Take an approach similar to org.apache.hadoop.mapreduce.lib.output.MultipleOutputs.
The Datasets to be written to must be defined in advance, in the beforeSubmit of the MapReduce job.
In the mapper/reducer, the user specifies the name of the output Dataset, and our helper class (MultipleOutputs) determines the appropriate OutputFormat and configuration for writing.
...