Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
Users of CDAP might have already existing data in HDFS or HBase. In order to bring the data into CDAP, the only ways are to create a data pipeline or a CDAP app, to re-process the data and to create a CDAP datasets for further analysis. Allowing capabilities to use existing data without having to re-process will allow for great user experience and a good on-ramp for customers with a lot of legacy data.
Goals
Ease of adoption: Allow users to leverage their existing data in HDFS or HBase without having to re-process the data
Usability: Create datasets from existing data in HDFS or HBase and provide a great user-experience.
User Stories
- As a user, I would like to create dataset from existing data on HDFS (or HBase)
- As a user, I would like to apply schema to the dataset that is created from existing data on HDFS (or HBase)
- As a user, I would like to apply transformations on data existing on HDFS (or HBase) to derive the data with pre-defined schema
- As a user, I would like to use explore queries on the dataset that is created from existing data on HDFS
- As a user, I would like to use the dataset as a source in data pipelines.
Design
Cover details on assumptions made, design alternatives considered, high level design
Approach
Approach #1
Approach #2
API changes
New Programmatic APIs
New Java APIs introduced (both user facing and internal)
Deprecated Programmatic APIs
New REST APIs
Path | Method | Description | Response Code | Response |
---|---|---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application | 200 - On success 404 - When application is not available 500 - Any internal errors |
|
Deprecated REST API
Path | Method | Description |
---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application |
CLI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
UI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
Security Impact
What's the impact on Authorization and how does the design take care of this aspect
Impact on Infrastructure Outages
System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
Releases
Release X.Y.Z
Release X.Y.Z
Related Work
- Work #1
- Work #2
- Work #3