Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. As a pipeline developer, I want to create realtime ETL pipelines that run using Spark Streaming.

  2. As a pipeline developer, I want to enrich streaming events by joining to other datasets.

  3. As a pipeline developer, I want to be able to group events into time windows in my streaming pipeline.

  4. As a pipeline developer, I want to train machine learning models in my streaming pipeline.
  5. As a plugin developer, I want my transform, aggregate, and join plugins to work in both Spark Streaming and Data Pipelines.

  6. As a plugin developer, I want to be able to use features available in Spark Streaming like MLLib to write plugins.

Design

We will introduce a new artifact similar to the DataPipeline artifact, called the DataStreaming artifact. This artifact will support the realtimesource, realtimesink, transform, sparkcompute, and joiner plugin types.  In addition, we will add streamingsource and streamingsink plugin types.