Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Note: Make this page's parent CDAP once facets work for both streams and datasets.

Requirements

  • CDAP exposes the API for developers to build their own plugin for parsing data in a Stream.

  • Developer should have the ability to build his own parser using the CDAP provided API for parsing events in the stream.
  • Developer/Operations should then have the ability to deploy the parser implemented into a directory with a configuration
  • User should specify at minimum a name and description for the plugin in a configuration
  • User should have the ability to list the available plugins using REST API / CLI
  • User should have the ability to view using REST API / CLI the pre-defined schema of the plugin in case the plugin defines one.
  • User should have the ability to list the views associated with a Stream using REST API / CLI / UI
  • User should have the ability to apply the plugin to a Stream and create a view
  • User specified view name should be registered in a catalog allowing one to query (SQL) using the view name.
  • User should have the ability to apply different plugins on the same Stream creating different view
  • User should have the ability to change the plugin associated with a view
  • CDAP should provide a text wrangler plugin that allows one to create rules for parsing mostly text files.

Overview

  • A facet view is another place where data can be read, like streams and datasets.
    •  Therefore, facets views are readable anywhere a stream or dataset is readable (MapReduce/Spark program, flows, ETL) 
  • A facet view is a read-only view of a stream or dataset, with a specific read format (schema + format (csv, avro))
  • If explore is enabled, then a Hive table will be created for each facetview

3.2 Plan

  • Facet view HTTP API, client, CLI
  • Facets views can be a view of a stream (not dataset yet)
  • Hive tables will be created for facets views when explore is enabled

...

view HTTP API

Path
Request
Response
Notes
PUT /v3/namespaces/<namespace>/facetsviews/<facet><view>
{
  "stream""stream1",
  "format": <same as before>
}
 Creates or modifies a facetview.
GET /v3/namespaces/<namespace>/facetsviews/<facet><view> 
{"id":"someFacetsomeview""stream""stream1""format": ..}
Get details of an individual facetview.
GET /v3/namespaces/<namespace>/facetsviews  Lists all facetsviews.
DELETE /v3/namespace/<namespace>/facetview/<facet><view>  Deletes a facetview.
GET /v3/namespaces/<namespace>/stream/<stream>/facetsviews 
[
  {"id":"someFacetsomeview""stream""stream1""format": ..},
  {"id":"otherFacetotherview""stream""stream2""format": ..}
]
Lists all facets views associated with a stream.

...

  • If Explore is disabled, then Hive tables will not be created for facetsviews

Sample CLI Flow

  1. User wants to create a stream "stream1" that contains CSV data and read using two facets views "facet1view1" and "facet2view2".
    1. create stream stream1
    2. send stream stream1 "a,b,c"
      send stream stream1 "d,e,f" 
    3. execute "select * from stream_stream1" // may be removed later, as facets views already cover this

      body
      a,b,c
      d,e,f
    4. create facet facet1view view1 stream1 format csv "ticker string, num_traded int, price double"
    5. execute "select * from facetview_facet1view1"

      ticker
      num_traded
      price
      abc
      def
    6. create facet facet2 view view2 stream1 format csv "ticker string, price double" "drop=$2" <-- drop $2 indicates "drop the 2nd field"

    7. execute "select * from facetview_facet2view2"

      ticker
      price
      ac
      df