...
Streams
- Format
Schema as Metadata
Schema as metadata is meant to add the capability in CDAP for users to be able to retrieve datasets/streams with a field X optionally of type Y.
Design Considerations
Storage
There is a case for storing System Metadata in a separate dataset for the following reasons:
- Only the CDAP system can update System Metadata.
- System Metadata may have different authorization as well as retention policies than Business Metadata
- System Metadata can be updated at specific times only, where users can update Business Metadata at any given time
However, if stored as a separate dataset, the metadata system will have to manage two different datasets. APIs may need filters, etc - TODO: Details
Runtime
System Metadata will be added/updated when:
- An app is deployed
- A new dataset instance is created
- A new stream is created
TODO: Details
System Metadata Updates
Only the CDAP system can update system metadata for entities. This capability will not be exposed to users. However, given this design choice, users will need a capability in CDAP to discover all the system tags/properties. To start off with, this can be exposed via a simple API that lists all tags/properties. It can later be extended via full-text search capabilities when CDAP has a more comprehensive search capability that extends beyond IndexedTables and prefix lookups.
Questions