Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Checklist

  •  User Stories Documented
  •  User Stories Reviewed
  •  Design Reviewed
  •  APIs reviewed
  •  Release priorities assigned
  •  Test cases reviewed
  •  Blog post

Introduction 

Users of CDAP might have already existing data in HDFS or HBase. In order to bring the data into CDAP, the only ways are to create a data pipeline or a CDAP app , to that would re-process the data and to create a CDAP datasets for further analysis. Allowing capabilities to use existing data without having to re-process will allow for great user experience and a good on-ramp for customers with a lot of legacy data.

Goals

  • Ease of adoption: Allow users to leverage their existing data in HDFS or HBase without having to re-process the data 
  • Usability: Create datasets from existing data in HDFS or HBase and provide a great user-experience.

User Stories 

  • As a user, I would like to create dataset from existing data on HDFS (or HBase)
  • As a user, I would like to apply schema to the dataset that is created from existing data on HDFS (or HBase)
  • As a user, I would like to apply transformations on data existing on HDFS (or HBase) to derive the data with pre-defined schema
  • As a user, I would like to use explore queries on the dataset that is created from existing data on HDFS
  • As a user, I would like to use the dataset as a source in data pipelines.

Design

Cover details on assumptions made, design alternatives considered, high level design

Approach

Approach #1

Approach #2

API changes

New Programmatic APIs

New Java APIs introduced (both user facing and internal)

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionResponse CodeResponse
/v3/apps/<app-id>GETReturns the application spec for a given application

200 - On success

404 - When application is not available

500 - Any internal errors

 

     

Deprecated REST API

PathMethodDescription
/v3/apps/<app-id>GETReturns the application spec for a given application

CLI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

UI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results
   
   
   
   

Releases

Release X.Y.Z

Release X.Y.Z

Related Work

  • Work #1
  • Work #2
  • Work #3

 

Future work