...
As a pipeline developer, I want to create realtime ETL pipelines that run using Spark Streaming.
As a pipeline developer, I want to be able to group events into time windows in my streaming pipeline.
As a pipeline developer, I want to perform aggregations over windows of events in my pipeline.
As a pipeline developer, I want to enrich streaming events by joining to other datasets.
As a pipeline developer, I want to be able to group events into time windows join data streams in my streaming pipeline.
- As a pipeline developer, I want to train machine learning models in my streaming pipeline.
As a plugin developer, I want my transform, aggregate, and join plugins to work in both Spark Streaming and Data Pipelines.
As a plugin developer, I want to be able to use features available in Spark Streaming like MLLib to write plugins.
...
A Join stage with n inputs will implemented as n mapToPair() calls on all inputs into the stage, then x - 1 joins (assuming x inputs).
An aggregator plugin will operate over windows in the stream, and can be implemented by a call to flatMapToPair(), then reduceByKey().
Windowing and Reduce can be implemented through a general StreamingTransform API:
...