Goals
To allow users to use the Hydrator drag and drop UI to easily create pipelines that run on Spark Streaming, leveraging built-in capabilities like windowing and machine learning.
Checklist
- User stories documented (Albert)
- User stories reviewed (Nitin)
- Design documented (Albert)
- Design reviewed (Terence/Andreas)
- Feature merged ()
- Examples and guides ()
- Integration tests ()
- Documentation for feature ()
- Blog post
Use Cases
User Stories
- As a pipeline developer, I want to be able to join (inner, left outer, right outer, full outer) two or more stage outputs on some common fields, or do a cross join.
- As a pipeline developer, I want to be able to get metrics on number of records in and records out of the join.
- [UI] As a pipeline developer, I want to be able to see the schema of all input into the join, and the schema output by the join.
- As a pipeline developer, I want to be able to choose whether the pipeline with the join runs with mapreduce or spark.
- As a plugin developer, I want to be able to write a plugin that gets data from multiple stages joins them.
Design