Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Type of program

Datasets

  • Type of dataset
  • Schema
  • RecordScannable/BatchWritable/RecordWritable/BatchReadable
  • Other properties

Streams

  • Format

Views

  • ViewFormat

Schema as Metadata

Schema as metadata is meant to add the capability in CDAP for users to be able to retrieve datasets/streams with a field X optionally of type Y. 

Design Considerations

Storage

...

  • The REST APIs for adding/updating/deleting system metadata will not be documented, and will not be exposed via the Router
  • The SystemMetadataUpdater will use service discovery to discover the Metadata Service and make REST calls.

Schema as Metadata

Schema as metadata is meant to add the capability in CDAP for users to be able to retrieve datasets/streams with a field X optionally of type Y.

For storing schema as a system metadata, we will use the existing metadata properties mechanism. An option to store Schema as metadata would be to store every field in the schema as the metadata property:

Key: 

field^A<fieldName>

Value:

<fieldType>

Note: We may have to reverse this, based on the indexing mechanisms available in the System Metadata Dataset. If it supports key:value and value type searches, then we may have to swap the key and value above, so two types of searches can be supported:

  1. All Datasets with the field field1
  2. All Datasets with the field field1 of type int

Views

Up until 3.2, users could not associate metadata with stream views. We will need to add this capability in 3.2. However, there would not be any parent-child relationship between a view, and its stream, as far as metadata is concerned. A view will be a separate entity from its stream, and will show up separately in search results. Metadata of a stream will not be automatically available as metadata of a view. 

Implementation

REST APIs

Questions