Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Note: Moved to Views.

Requirements

      • CDAP exposes the API for developers to build their own plugin for parsing data in a Stream.
      • Developer should have the ability to build his own parser using the CDAP provided API for parsing events in the stream.
      • Developer/Operations should then have the ability to deploy the parser implemented into a directory with a configuration
      • User should specify at minimum a name and description for the plugin in a configuration
      • User should have the ability to list the available plugins using REST API / CLI
      • User should have the ability to view using REST API / CLI the pre-defined schema of the plugin in case the plugin defines one.
      • User should have the ability to list the views associated with a Stream using REST API / CLI / UI
      • User should have the ability to apply the plugin to a Stream and create a view
      • User specified view name should be registered in a catalog allowing one to query (SQL) using the view name.
      • User should have the ability to apply different plugins on the same Stream creating different view
      • User should have the ability to change the plugin associated with a view
      • CDAP should provide a text wrangler plugin that allows one to create rules for parsing mostly text files.

...

      1. Pluggable stream record formats (the format in which data is read from a stream, which is different from the format in which files are written to a stream)
        1. Expose cdap-spi module that contains StreamEventRecordFormat abstract class
        2. Each StreamEventRecordFormat will be associated with a simple name (e.g. grok, clf, avro)
        3. "system" record formats will come from within the CDAP codebase (grok, clf, avro)
        4. "user" record formats will be loaded from jars in a certain directory containing SPI jars
          1. In a later revision, this will may be namespaced and/or managed via an HTTP API
      2. Stream views
        1. A stream view is an explorable view (Hive table) of a stream, with a particular record format
        2. A stream may have multiple views
        3. Upon creating a stream, the stream will have a default view

Stream View HTTP API

...

Changes to existing APIs

PathRequestResponseNotes
PUT /v3/namespaces/<namespace>/streams/<stream>  Instead of creating a Hive table with a default record format, this will create a "default" view with a default record format.
DELETE /v3/namespaces/<namespace>/streams/<stream>  This will delete all associated views for the stream.
POST /v3/namespaces/<namespace>/streams/properties"format" field will be considered "deprecated"
-> if format is given, this modifies the default view for backwards compat
 Notify user that "format" field is deprecated somehow?

...