Requirements
- CDAP exposes the API for developers to build their own plugin for parsing data in a Stream.
- Developer should have the ability to build his own parser using the CDAP provided API for parsing events in the stream.
- Developer/Operations should then have the ability to deploy the parser implemented into a directory with a configuration
- User should specify at minimum a name and description for the plugin in a configuration
- User should have the ability to list the available plugins using REST API / CLI
- User should have the ability to view using REST API / CLI the pre-defined schema of the plugin in case the plugin defines one.
- User should have the ability to list the views associated with a Stream using REST API / CLI / UI
- User should have the ability to apply the plugin to a Stream and create a view
- User specified view name should be registered in a catalog allowing one to query (SQL) using the view name.
- User should have the ability to apply different plugins on the same Stream creating different view
- User should have the ability to change the plugin associated with a view
- CDAP should provide a text wrangler plugin that allows one to create rules for parsing mostly text files.
Overview
- Pluggable stream record formats (the format in which data is read from a stream, which is different from the format in which files are written to a stream)
- Expose cdap-spi module that contains StreamEventRecordFormat abstract class
- Each StreamEventRecordFormat will be associated with a simple name (e.g. grok, clf, avro)
- "system" record formats will come from within the CDAP codebase (grok, clf, avro)
- "user" record formats will be loaded from jars in a certain directory containing SPI jars
- In a later revision, this will may be namespaced and/or managed via an HTTP API
- Stream views
- A stream view is an explorable view (Hive table) of a stream, with a particular record format
- A stream may have multiple views
- Upon creating a stream, the stream will have a default view
Stream View HTTP API
Changes to existing APIs
Path | Request | Response | Notes |
---|---|---|---|
PUT /v3/namespaces/<namespace>/streams/<stream> | Instead of creating a Hive table with a default record format, this will create a "default" view with a default record format. | ||
DELETE /v3/namespaces/<namespace>/streams/<stream> | This will delete all associated views for the stream. | ||
POST /v3/namespaces/<namespace>/streams/properties | "format" field will be considered "deprecated" -> if format is given, this modifies the default view for backwards compat | Notify user that "format" field is deprecated somehow? |
New APIs
Path | Request | Response | Notes |
---|---|---|---|
PUT /v3/namespaces/<namespace>/views/stream/<view> | { "stream": "stream1", "format": <same as before> } | Creates or modifies a view. | |
GET /v3/namespaces/<namespace>/views/stream/<view> | {"id":"someView", "stream": "stream1", "format": ..} | Get details of an individual view. | |
GET /v3/namespaces/<namespace>/views/stream | Lists all views. | ||
DELETE /v3/namespace/<namespace>/views/stream/<view> | Deletes a view. | ||
GET /v3/namespaces/<namespace>/stream/<stream>/views | [ {"id":"someView", "stream": "stream1", "format": ..}, {"id":"otherView", "stream": "stream2", "format": ..} ] | Lists all views for a stream. |
Notes
- If Explore is disabled, then all stream view APIs will be disabled
- Existing Hive queries must not be affected by the deletion or modification of any stream views it may be using
Sample CLI Flow
- User wants to create a stream "stream1" that contains CSV data and read via Explore through two views "view1" and "view2".
- create stream stream1
- send stream stream1 "a,b,c"
send stream stream1 "d,e,f" execute "select * from stream_stream1"
body a,b,c d,e,f - create stream-view stream1 view1 format csv "ticker string, num_traded int, price double"
execute "select * from stream_stream1_view_view1"
ticker num_traded price a b c d e f create stream-view stream1 view2 format csv "drop=num_traded"
execute "select * from stream_stream1_view_view2"
ticker price a c d f