Table of Contents |
---|
Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
The schema associated with HDFS Files may vary based on set of wrangler directives. Current implementation of datasets do not apply wrangler directives while exploring datasets or reading records using hydrator pipeline.
Goals
Better User experience: The goal is to apply wrangler directives while reading data through explore and hydrator pipelines from datasets.
User Stories
Users of CDAP might have already existing data in HDFS or HBase. In order to bring the data into CDAP, the only ways are to create a data pipeline or a CDAP app, to re-process the data and to create a CDAP datasets for further analysis. Allowing capabilities to use existing data without having to re-process will allow for great user experience and a good on-ramp for customers with a lot of legacy data.
Goals
Ease of adoption: Allow users to leverage their existing data in HDFS or HBase without having to re-process the data
Usability: Create datasets from existing data in HDFS or HBase and provide a great user-experience.
User Stories
- As a user, I would like to create dataset from existing data on HDFS (or HBase)
- As a user, I would like to apply schema to the dataset that is created from existing data on HDFS (or HBase)
- As a user, I would like to apply transformations on data existing on HDFS (or HBase) to derive the data with pre-defined schema
- As a user, I would like to use explore queries on the dataset that is created from existing data on HDFS
- As a user, I would like to use the dataset as a source in data pipelines.
Design
Cover details on assumptions made, design alternatives considered, high level design
Approach
Approach #1
Approach #2
API changes
New Programmatic APIs
New Java APIs introduced (both user facing and internal)
Deprecated Programmatic APIs
New REST APIs
Path | Method | Description | Response Code | Response |
---|---|---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application | 200 - On success 404 - When application is not available 500 - Any internal errors |
|
Deprecated REST API
Path | Method | Description |
---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application |
CLI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
UI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
Security Impact
What's the impact on Authorization and how does the design take care of this aspect
Impact on Infrastructure Outages
System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
Releases
Release X.Y.Z
Release X.Y.Z
Related Work
- Work #1
- Work #2
- Work #3