Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. As a pipeline developer, I want to create realtime ETL pipelines that run using Spark Streaming.

  2. As a pipeline developer, I want to be able to group events into time windows in my streaming pipeline.

  3. As a pipeline developer, I want to perform aggregations over windows of events in my pipeline.

  4. As a pipeline developer, I want to enrich streaming events by joining to other datasets.

  5. As a pipeline developer, I want to be able to group events into time windows join data streams in my streaming pipeline.

  6. As a pipeline developer, I want to train machine learning models in my streaming pipeline.
  7. As a plugin developer, I want my transform, aggregate, and join plugins to work in both Spark Streaming and Data Pipelines.

  8. As a plugin developer, I want to be able to use features available in Spark Streaming like MLLib to write plugins.

...

A Join stage with n inputs will implemented as n mapToPair() calls on all inputs into the stage, then x - 1 joins (assuming x inputs).

 

An aggregator plugin will operate over windows in the stream, and can be implemented by a call to flatMapToPair(), then reduceByKey().

 

Windowing and Reduce can be implemented through a general StreamingTransform API:

...