Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

This page covers the requirements, design and implementation of metadata and data discovery features in 3.3

High Level Requirements

  1. Metadata search
  2. Schema as metadata
  3. System metadata
  4. CLI, Test Framework Support for metadata
  5. UI for Metadata Search
  6. UI for Lineage
  7. UI for Adding/Updating metadata properties/tags
  8. Metadata search
  9. Lineage based on Type of Dataset Access
  10. Monitoring/Logs for Metadata Service

...

  1. Schema as metadata
  2. System metadata
  3. Metadata CLI
  4. Test Framework support for Metadata
  5. UI ... (needs to be finalized)

User Stories

IdDescriptionComments
U1As a user, I should be able to search Datasets containing the specified fieldsList the kinds of queries that will be supported
U2As a CDAP system, I should be able to annotate CDAP entities with system metadata automatically

List all the system tags that should be annotated

  • Kind of entity (dataset, app, program, program type, stream)?
  • artifact name

 

System metadata for each entity is listed below

U3As a user, I should be able to access and update CDAP metadata using the CDAP CLI 
U4As a developer, I should be able to access and update CDAP metadata using the CDAP Test Framework 
U5As a user, I should be able to search CDAP entities based on metadata using the CDAP UI 
U6As a user, I should be able to view the lineage of a CDAP dataset/stream in a specified time window using the CDAP UI 

 

System Metadata

Kinds of system metadata:

Artifacts

TBD

Applications

  • Artifact name

...

As a result, the metadata system will have to manage two different datasets. The storage format of both datasets (both keys and values) will be identical, they will only write to separate tables.

A higher level construct (TBD, but an extended BusinessMetadataStore or MetadataAdmin) , MetadataStore will have to be extended the ability to interact with two separate datasets. It will use a MetadataScope (possible values USER and SYSTEM) object to distinguish between operations that should go to the business metadata dataset from the ones that should go to the system metadata dataset.

The MetadataStore class is chosen to have the ability to interact with two separate datasetsdifferent metadata datasets, because it is the API that is used across CDAP (LineageDatasetFramework, Lineage classes, StreamAdmin, AppLifecycleService, DeletedProgramHandlerStage, to name a few classes) to interact with Metadata. There was an option to have this ability in the MetadataAdmin object instead and have the MetadataStore be local to a specific dataset (this may have made the MetadataStore class itself cleaner). However, this way, we would have needed the downstream classes (users of MetadataStore) handle multiple MetadataStores, which is not clean. Also, currently, the MetadataAdmin is only used by the MetadataHttpHandler. As a result, we cannot move this logic to the MetadataAdmin class, since not all clients of the metadata system have access to it.

The MetadataAdmin class is currently in app-fabric, because it needs access to the AppMetadataStore to check if entities exist. This is not ideal, but to fix this, we need to split cdap-app-fabric, which is much beyond the scope of the Metadata work.

History

We will re-use the same pattern that the Business Metadata Dataset uses to store history. There will however be one update to not serialize the MetadataScope in the history, as described in 

Jira Legacy
serverCask Community Issue Tracker
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
keyCDAP-4295

Runtime

For interacting with the System Metadata Dataset, we will introduce a SystemMetadataUpdater interface, which will be injected at various stages outlined below, to add, update or delete business system metadata

System Metadata will be added when:

...

Up until 3.2, users could not associate metadata with stream views. We will need to add this capability in 3.2. However, there would not be any parent-child relationship between a view, and its stream, as far as metadata is concerned. A view will be a separate entity from its stream, and will show up separately in search results. Metadata of a stream will not be automatically available as metadata of a view

Upgrade

The BusinessMetadataDataset dataset type introduced in 3.2 will be renamed to MetadataDataset, since it will also serve system metadata in 3.3. For existing CDAP installations, we will need an upgrade step to change the type of the existing "business.metadata" dataset in the "datasets.instance" table. 

Implementation

Jira Legacy
serverCask Community Issue Tracker
columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
keyCDAP-4297

New REST APIs

  • The Metadata REST APIs to retrieve properties and tags will be updated to accept a scope query parameter. It will support the values user and system. If scope is not specified, the API will return all metadata across both scopes. 
  • New APIs will be added for View and artifacts: 

 

PurposeAPIBodyResponseRoutableCommentsApproved?
Annotate business metadata for view
POST /v3/namespaces/{namespace-id}/streams/{stream-id}/views/{view-id}/metadata/properties
Code Block
{
  "key1" : "value1",
  "key2" : "value2",
  //...
}

200: Successful

404: view not found in specified namespace

Yes
  • New keys are added.
  • Existing keys are updated.
  •   

 

Retrieve business metadata for view
GET /v3/namespaces/{namespace-id}/streams/{stream-id}/views/{view-id}/metadata/properties
N/A

200: Successful

404: View not found in specified namespace

Code Block
{
  "key1" : "value1",
  "key2" : "value2",
  //...
}
Yes 
  •   
Delete all business metadata for view
DELETE  /v3/namespaces/{namespace-id}/streams/{stream-id}/views/{view-id}/metadata/properties

 

 

200: Successful

404: View not found in specified namespace

Yes 
  •   
Delete selected key from business metadata for view
DELETE  /v3/namespaces/{namespace-id}/streams/{stream-id}/views/{view-id}/metadata/properties/{key}

 

 

200: Successful

404: View not found in specified namespace

Yes 
  •   
Search views containing business metadata
GET /v3/namespaces/{namespace-id}/metadata/search?query=term&target=view

N/A

200: Successful

Code Block
["view1", "view2"]
Yes
  • Only prefix search supported in 3.3.
  • Supported formats:
    • Value Prefix
    • Key:Value Prefix
  •   
Add tags to a view
POST /v3/namespaces/{namespace-id}/streams/{stream-id}/views/{view-id}/metadata/tags
Code Block
["tag1", "tag2"]

200: Successful

404: View not found in specified namespace

Yes

 

  •   

 

Retrieve view tags
GET /v3/namespaces/{namespace-id}/streams/{stream-id}/views/{view-id}/metadata/tags
N/A 
Code Block
["tag1", "tag2"]
Yes  
  •   
Remove all view tags
DELETE /v3/namespaces/{namespace-id}/streams/{stream-id}/views/{view-id}/metadata/tags

 

 

200: Successful

404: View not found in specified namespace

Yes  
  •   
Remove specified view tag
DELETE /v3/namespaces/{namespace-id}/streams/{stream-id}/views/{view-id}/metadata/tags/{tag}

 

 

200: Successful

404: View not found in specified namespace

Yes  
  •   
Get all business metadata for a view
GET /v3/namespaces/{namespace-id}/streams/{stream-id}/views/{view-id}/metadata

 

 

200: Successful

404: View not found in specified namespace

Yes Retrieves all properties and tags for a stream.
  •   
  •   

Existing/Changed REST APIs and CLI Commands:

Note: Changes are in blue

 

PurposeAPICLI CommandBodyResponseCommentsApproved?
Annotate business metadata for datasets
POST /v3/namespaces/{namespace-id}/datasets/{dataset-id}/metadata/properties
set metadata properties datasets <dataset-id>
Code Block
{
  "key1" : "value1",
  "key2" : "value2",
  //...
}

200: Successful

404: Dataset not found in specified namespace

  • New keys are added.
  • Existing keys are updated.
  •   

 

Annotate business metadata for apps
POST /v3/namespaces/{namespace-id}/apps/{app-id}/metadata/properties
set metadata properties apps <app-id>
Code Block
{
  "key1" : "value1",
  "key2" : "value2",
  //...
}

200: Successful

404: App not found in specified namespace

  • New keys are added.
  • Existing keys are updated.
  •   

 

Annotate business metadata for programs
POST /v3/namespaces/{namespace-id}/apps/{app-id}/{program-type}/{program-id}/metadata/properties
set metadata properties app <app-id> program-type <program-type>
Code Block
{
  "key1" : "value1",
  "key2" : "value2",
  //...
}

200: Successful

404: Program not found in specified namespace

  • New keys are added.
  • Existing keys are updated.
  •   

 

Annotate business metadata for streams
POST /v3/namespaces/{namespace-id}/streams/{stream-id}/metadata/properties
set metadata properties streams <stream-id>
Code Block
{
  "key1" : "value1",
  "key2" : "value2",
  //...
}

200: Successful

404: Stream not found in specified namespace

  • New keys are added.
  • Existing keys are updated.
  •   

 

Retrieve business metadata for datasets
GET /v3/namespaces/{namespace-id}/datasets/{dataset-id}/metadata/properties
get metadata properties scope datasets <dataset-id>N/A

200: Successful

404: Dataset not found in specified namespace

Code Block
{
  "key1" : "value1",
  "key2" : "value2",
  //...
}
 
  •   
Retrieve business metadata for apps
GET /v3/namespaces/{namespace-id}/apps/{app-id}/metadata/properties
get metadata properties scope apps <app-id>N/A

200: Successful

404: App not found in specified namespace

Code Block
{
  "key1" : "value1",
  "key2" : "value2",
  //...
}
 
  •   
Retrieve business metadata for programs
GET /v3/namespaces/{namespace-id}/apps/{app-id}/{program-type}/{program-id}/metadata/properties
get metadata properties scope apps <app-id> program-type <program-id>N/A

200: Successful

404: Program not found in specified namespace

Code Block
{
  "key1" : "value1",
  "key2" : "value2",
  //...
}
 
  •   
Retrieve business metadata for streams
GET /v3/namespaces/{namespace-id}/streams/{stream-id}/metadata/properties
get metadata properties scope streams <stream-id>N/A

200: Successful

404: Stream not found in specified namespace

Code Block
{
  "key1" : "value1",
  "key2" : "value2",
  //...
}
 
  •   
Delete all business metadata for datasets
DELETE  /v3/namespaces/{namespace-id}/datasets/{dataset-id}/metadata/properties
delete metadata properties datasets <dataset-id>

 

N/A

200: Successful

404: Dataset not found in specified namespace

 
  •   
Delete selected key from business metadata for datasets
DELETE  /v3/namespaces/{namespace-id}/datasets/{dataset-id}/metadata/properties/{key}
delete metadata properties datasets <dataset-id> <key>

 

N/A

200: Successful

404: Dataset not found in specified namespace

 
  •   
Delete all business metadata for apps
DELETE  /v3/namespaces/{namespace-id}/apps/{app-id}/metadata/properties
delete metadata properties apps <app-id>

 

 

200: Successful

404: App not found in specified namespace

 
  •   
Delete selected key from business metadata for apps
DELETE  /v3/namespaces/{namespace-id}/apps/{app-id}/metadata/properties/{key}
delete metadata properties apps <app-id> <key>

 

 

200: Successful

404: App not found in specified namespace

 
  •   
Delete all business metadata for programs
DELETE  /v3/namespaces/{namespace-id}/apps/{app-id}/{program-type}/{program-id}/metadata/properties
delete metadata properties apps <app-id> program-type <program-id>

 

 

200: Successful

404: Program not found in specified namespace

 
  •   
Delete all business metadata for programs
DELETE  /v3/namespaces/{namespace-id}/apps/{app-id}/{program-type}/{program-id}/metadata/properties/{key}
delete metadata properties apps <app-id> program-type <program-id> <key>

 

 

200: Successful

404: Program not found in specified namespace

 
  •   
Delete all business metadata for streams
DELETE  /v3/namespaces/{namespace-id}/streams/{stream-id}/metadata/properties
delete metadata properties streams <stream-id>

 

 

200: Successful

404: Stream not found in specified namespace

 
  •   
Delete selected key from business metadata for streams
DELETE  /v3/namespaces/{namespace-id}/streams/{stream-id}/metadata/properties/{key}
delete metadata properties streams <stream-id> <key>

 

 

200: Successful

404: Stream not found in specified namespace

 
  •   
Search entities containing business metadata
GET /v3/namespaces/{namespace-id}/metadata/search?query=term&target=<target-type>

 

target-type => dataset, app, program, stream, view


search metadata scope <search-query> <target>

N/A

200: Successful

Code Block
["entity1", "entity2"]
  • Only prefix search supported in 3.2.
  • Backwards incompatible change of output format in 3.3
  • Supported formats:
    • Value Prefix
    • Key:Value Prefix
  •   
View Dataset Lineage
GET /v3/namespaces/{namespace-id}/datasets/{dataset-id}/lineage?start=<start-ts>&end=<end-ts>&maxLevels=<max-levels>
get lineage datasets <dataset-id> <startTs> <endTs> <maxLevels>N/A

200: Successful

Response TBD, but will contain a DAG representation

 
  •   
View Stream Lineage
GET /v3/namespaces/{namespace-id}/streams/{stream-id}/lineage?start=<start-ts>&end=<end-ts>&maxLevels=<max-levels>
get lineage streams <stream-id> <startTs> <endTs> <maxLevels>N/A

200: Successful

Response TBD, but will contain a DAG representation

 
  •   
View Run Id Accesses
GET /v3/namespaces/{namespace-id}/apps/{app-id}/{program-type}/{program-id}/runs/{run-id}/metadata
get metadata apps <app-id> program-type <program-id> runs <run-id>N/A

200: Successful

Response Body TBD

  • TODO: Figure out a better name
  • May not be part of 3.2
  •   
Add tags to a dataset
POST /v3/namespaces/{namespace-id}/datasets/{dataset-id}/metadata/tags
add metadata tags datasets <dataset-id>
Code Block
["tag1", "tag2"]

200: Successful

404: Dataset not found in specified namespace

 

  •   

 

Add tags to an app
POST /v3/namespaces/{namespace-id}/apps/{app-id}/metadata/tags
add metadata tags apps <app-id>
Code Block
["tag1", "tag2"]

200: Successful

404: App not found in specified namespace

 

  •   

 

Add tags to a program
POST /v3/namespaces/{namespace-id}/apps/{app-id}/{program-type}/{program-id}/metadata/tags
add metadata tags apps <app-id> program-type <program-id>
Code Block
["tag1", "tag2"]

200: Successful

404: Program not found in specified namespace

 

  •   

 

Add tags to a stream
POST /v3/namespaces/{namespace-id}/streams/{stream-id}/metadata/tags
 
Code Block
["tag1", "tag2"]

200: Successful

404: Stream not found in specified namespace

 

  •   

 

Retrieve dataset tags
GET /v3/namespaces/{namespace-id}/datasets/{dataset-id}/metadata/tags
get metadata tags datasets <dataset-id> N/A
Code Block
["tag1", "tag2"]
 
  •   
Retrieve app tags
GET /v3/namespaces/{namespace-id}/apps/{app-id}/metadata/tags
 N/A 
Code Block
["tag1", "tag2"]
 
  •   
Retrieve program tags
GET /v3/namespaces/{namespace-id}/apps/{app-id}/{program-type}/{program-id}/metadata/tags
  N/A
Code Block
["tag1", "tag2"]
 
  •   
Retrieve stream tags
GET /v3/namespaces/{namespace-id}/streams/{stream-id}/metadata/tags
 N/A 
Code Block
["tag1", "tag2"]
 
  •   
Remove all dataset tags
DELETE /v3/namespaces/{namespace-id}/datasets/{dataset-id}/metadata/tags
delete metadata tags datasets <dataset-id>

N/A

 

200: Successful

404: Dataset not found in specified namespace

 

  •   

 

Remove specified dataset tag
DELETE /v3/namespaces/{namespace-id}/datasets/{dataset-id}/metadata/tags/{tag}
 

N/A

 

200: Successful

404: Dataset not found in specified namespace

 

  •   

 

Remove all app tags
DELETE /v3/namespaces/{namespace-id}/apps/{app-id}/metadata/tags
 

N/A

 

200: Successful

404: App not found in specified namespace

 
  •   
Remove specified app tag
DELETE /v3/namespaces/{namespace-id}/apps/{app-id}/metadata/tags/{tag}
 

N/A

 

200: Successful

404: App not found in specified namespace

 
  •   
Remove all program tags
DELETE /v3/namespaces/{namespace-id}/apps/{app-id}/{program-type}/{program-id}/metadata/tags
 

N/A

 

200: Successful

404: Program not found in specified namespace

 
  •   
Remove specified program tag
DELETE /v3/namespaces/{namespace-id}/apps/{app-id}/{program-type}/{program-id}/metadata/tags/{tag}
 

N/A

 

200: Successful

404: Program not found in specified namespace

 
  •   
Remove all stream tags
DELETE /v3/namespaces/{namespace-id}/streams/{stream-id}/metadata/tags
 

 

 

200: Successful

404: Stream not found in specified namespace

 
  •   
Remove specified stream tag
DELETE /v3/namespaces/{namespace-id}/streams/{stream-id}/metadata/tags/{tag}
 

 

 

200: Successful

404: Stream not found in specified namespace

 
  •   
Remove all business metadata for a dataset
DELETE /v3/namespaces/{namespace-id}/datasets/{dataset-id}/metadata
 

 

 

200: Successful

404: Dataset not found in specified namespace

Removes all properties and tags from a dataset. Will not happen in 3.2
  •   
Remove all business metadata for an app
DELETE /v3/namespaces/{namespace-id}/apps/{app-id}/metadata
 

 

 

200: Successful

404: App not found in specified namespace

Removes all properties and tags from an app. Will not happen in 3.2
  •   
Remove all business metadata for a program
DELETE /v3/namespaces/{namespace-id}/apps/{app-id}/{program-type}/{program-id}/metadata
 

 

 

200: Successful

404: Program not found in specified namespace

Removes all properties and tags from a program. Will not happen in 3.2
  •   
Remove all business metadata for a dataset
DELETE /v3/namespaces/{namespace-id}/datasets/{dataset-id}/metadata
 

 

 

200: Successful

404: Dataset not found in specified namespace

Removes all properties and tags from a dataset. Will not happen in 3.2
  •   
Get all business metadata for a dataset
GET /v3/namespaces/{namespace-id}/datasets/{dataset-id}/metadata?scope=system/user
 

 

 

200: Successful

404: Dataset not found in specified namespace

Retrieves all properties and tags for a dataset. Will not happen in 3.2
  •   
Get all business metadata for an app
GET /v3/namespaces/{namespace-id}/apps/{app-id}/metadata
 

 

 

200: Successful

404: App not found in specified namespace

Retrieves all properties and tags for an app. Will not happen in 3.2
  •   
Get all business metadata for a program
GET /v3/namespaces/{namespace-id}/apps/{app-id}/{program-type}/{program-id}/metadata
 

 

 

200: Successful

404: Program not found in specified namespace

Retrieves all properties and tags for a program. Will not happen in 3.2
  •   
Get all business metadata for a stream
GET /v3/namespaces/{namespace-id}/streams/{stream-id}/metadata
 

 

 

200: Successful

404: Stream not found in specified namespace

Retrieves all properties and tags for a stream. Will not happen in 3.2
  •   
  •   

Questions

  1. The REST APIs to retrieve metadata will accept an additional scope parameter. Is it considered a backward incompatible change that if the scope is not specified, the API will now return all metadata, and not just business metadata, like it did in 3.3?

...