Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. A developer should be able to set multiple datasets as input to one MapReduce job.
    1. The datasets have the same type.
    2. The datasets have different types (this will require different Mapper classes). Note that the restriction here is that each of the Mappers must have the same output type (single Reducer class).
  2. A developer should be able to read from different partitions of a PartitionedFileSet (multiple time ranges of a TimePartitionedFileSet).
  3. A developer should be able to know which input they are processing data from, in their Mapper/Reducer.
  4. A developer should be able to use Cask Hydrator to set up multiple sources in their pipeline.
  5. A developer should be able to use Cask Hydrator to perform a join across two sources branches in their pipeline (See User Story #7 onĀ Cask Hydrator++).

...