Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Streams

  • Format

Schema as Metadata

Schema as metadata is meant to add the capability in CDAP for users to be able to retrieve datasets/streams with a field X optionally of type Y.

Design Considerations

Storage

There is a case for storing System Metadata in a separate dataset for the following reasons:

  1. Only the CDAP system can update System Metadata.  
  2. System Metadata may have different authorization as well as retention policies than Business Metadata
  3. System Metadata can be updated at specific times only, where users can update Business Metadata at any given time

However, if stored as a separate dataset, the metadata system will have to manage two different datasets. APIs may need filters, etc - TODO: Details

Runtime

System Metadata will be added/updated when:

  1. An app is deployed
  2. A new dataset instance is created
  3. A new stream is created

TODO: Details

System Metadata Updates

Only the CDAP system can update system metadata for entities. This capability will not be exposed to users. However, given this design choice, users will need a capability in CDAP to discover all the system tags/properties. To start off with, this can be exposed via a simple API that lists all tags/properties. It can later be extended via full-text search capabilities when CDAP has a more comprehensive search capability that extends beyond IndexedTables and prefix lookups.

Questions