Note: Make this page's parent CDAP once facets work for both streams and datasets.
Requirements
CDAP exposes the API for developers to build their own plugin for parsing data in a Stream.
- Developer should have the ability to build his own parser using the CDAP provided API for parsing events in the stream.
- Developer/Operations should then have the ability to deploy the parser implemented into a directory with a configuration
- User should specify at minimum a name and description for the plugin in a configuration
- User should have the ability to list the available plugins using REST API / CLI
- User should have the ability to view using REST API / CLI the pre-defined schema of the plugin in case the plugin defines one.
- User should have the ability to list the views associated with a Stream using REST API / CLI / UI
- User should have the ability to apply the plugin to a Stream and create a view
- User specified view name should be registered in a catalog allowing one to query (SQL) using the view name.
- User should have the ability to apply different plugins on the same Stream creating different view
- User should have the ability to change the plugin associated with a view
- CDAP should provide a text wrangler plugin that allows one to create rules for parsing mostly text files.
Overview
- A facet view is another place where data can be read, like streams and datasets.
- Therefore, facets views are readable anywhere a stream or dataset is readable (MapReduce/Spark program, flows, ETL)
- A facet view is a read-only view of a stream or dataset, with a specific read format (schema + format (csv, avro))
- If explore is enabled, then a Hive table will be created for each facetview
3.2 Plan
- Facet view HTTP API, client, CLI
- Facets views can be a view of a stream (not dataset yet)
- Hive tables will be created for facets views when explore is enabled
...
view HTTP API
Path | Request | Response | Notes | |
---|---|---|---|---|
PUT /v3/namespaces/<namespace>/facetsviews/<facet><view> |
| Creates or modifies a facetview. | ||
GET /v3/namespaces/<namespace>/facetsviews/<facet><view> |
| Get details of an individual facetview. | ||
GET /v3/namespaces/<namespace>/facetsviews | Lists all facetsviews. | |||
DELETE /v3/namespace/<namespace>/facetview/<facet><view> | Deletes a facetview. | |||
GET /v3/namespaces/<namespace>/stream/<stream>/facetsviews |
| Lists all facets views associated with a stream. |
...
- If Explore is disabled, then Hive tables will not be created for facetsviews
Sample CLI Flow
- User wants to create a stream "stream1" that contains CSV data and read using two facets views "facet1view1" and "facet2view2".
- create stream stream1
- send stream stream1 "a,b,c"
send stream stream1 "d,e,f" execute "select * from stream_stream1" // may be removed later, as facets views already cover this
bodya,b,c d,e,f - create facet facet1view view1 stream1 format csv "ticker string, num_traded int, price double"
execute "select * from facetview_facet1view1"
tickernum_tradedpricea b c d e f create facet facet2 view view2 stream1 format csv "ticker string, price double" "drop=$2" <-- drop $2 indicates "drop the 2nd field"
execute "select * from facetview_facet2view2"
tickerpricea c d f