Table of Contents |
---|
Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
CDAP currently captures lineage at the dataset level. With lineage, users can tell the program that read from or wrote to a dataset. It can help users determine which program wrote to/read from a dataset in a given timeframe. They can keep drilling into either the upstream or the downstream direction.
However, as a platform, CDAP understands schemas for most datasets. Schemas contain fields. It would be useful to be able to drill into how a field in a particular dataset was used(CREATE/READ/WRITE/DELETE) in a given time period.
Goals
- Provide CDAP platform support (in the form of API and storage) to track field level lineage.
- Pipelines can then expose this functionality to the plugins.
- Plugins (such as wrangler) will need to be updated to use this feature.
User Stories
- Breakdown of User-Stories
- User Story #1
- User Story #2
- User Story #3
Design
Cover details on assumptions made, design alternatives considered, high level designOption 1: Provide an API in the DatasetContext interface which will allow users to specify the set of field mutations.
Code Block | ||
---|---|---|
| ||
/**
* Get an instance of the specified Dataset.
*
* @param name The name of the Dataset
* @param mutations The field mutations that are expected on this dataset
* @param <T> The type of the Dataset
* @return An instance of the specified Dataset, never null.
* @throws DatasetInstantiationException If the Dataset cannot be instantiated: its class
* cannot be loaded; the default constructor throws an exception; or the Dataset
* cannot be opened (for example, one of the underlying tables in the DataFabric
* cannot be accessed).
*/
<T extends Dataset> T getDataset(String name, FieldMutations mutations) throws DatasetInstantiationException;
/**
* Get an instance of the specified Dataset.
*
* @param namespace The namespace of the Dataset
* @param name The name of the Dataset
* @param mutations The field mutations that are expected on this dataset
* @param <T> The type of the Dataset
* @return An instance of the specified Dataset, never null.
* @throws DatasetInstantiationException If the Dataset cannot be instantiated: its class
* cannot be loaded; the default constructor throws an exception; or the Dataset
* cannot be opened (for example, one of the underlying tables in the DataFabric
* cannot be accessed).
*/
<T extends Dataset> T getDataset(String namespace, String name, FieldMutations mutations) throws DatasetInstantiationException;
/**
* Get an instance of the specified Dataset.
*
* @param name The name of the Dataset
* @param arguments the arguments for this dataset instance
* @param mutations The field mutations that are expected on this dataset
* @param <T> The type of the Dataset
* @return An instance of the specified Dataset, never null.
* @throws DatasetInstantiationException If the Dataset cannot be instantiated: its class
* cannot be loaded; the default constructor throws an exception; or the Dataset
* cannot be opened (for example, one of the underlying tables in the DataFabric
* cannot be accessed).
*/
<T extends Dataset> T getDataset(String name, Map<String, String> arguments, FieldMutations mutations) throws DatasetInstantiationException;
/**
* Get an instance of the specified Dataset.
*
* @param namespace The namespace of Dataset
* @param name The name of the Dataset
* @param arguments the arguments for this dataset instance
* @param mutations The field mutations that are expected on this dataset
* @param <T> The type of the Dataset
* @return An instance of the specified Dataset, never null.
* @throws DatasetInstantiationException If the Dataset cannot be instantiated: its class
* cannot be loaded; the default constructor throws an exception; or the Dataset
* cannot be opened (for example, one of the underlying tables in the DataFabric
* cannot be accessed).
*/
<T extends Dataset> T getDataset(String namespace, String name, Map<String, String> arguments, FieldMutations mutations)
throws DatasetInstantiationException;
|
Approach
Approach #1
Approach #2
API changes
New Programmatic APIs
New Java APIs introduced (both user facing and internal)
Deprecated Programmatic APIs
New REST APIs
Path | Method | Description | Response Code | Response |
---|---|---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application | 200 - On success 404 - When application is not available 500 - Any internal errors |
|
Deprecated REST API
Path | Method | Description |
---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application |
CLI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
UI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
Security Impact
What's the impact on Authorization and how does the design take care of this aspect
Impact on Infrastructure Outages
System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
Releases
Release X.Y.Z
Release X.Y.Z
Related Work
- Work #1
- Work #2
- Work #3