Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
Users can use the command line tool DistCp to copy files between clusters with same or different filesystems. Currently CDAP does not support such file operations. We wish to add a Hydrator plugin that can help users perform whole file copies between different types of filesystems/ databases in the CDAP UI.
Goals
According to this user request, our new plugin ideally should have the following features:
- Should support file copying between the following file systems: HDFS, Amazon S3, and FTP
- Should support failover. It should start where it left during restarts or issues.
- We should have UI, where we can see progress
- We should have metrics for each process on how many files copied, size, time.
- Checks network bandwidth and displays estimated completion time.
- Maintains the timestamp of each file as is from the source.
- Specify Path filters through UI on the fly.
- File permission configurations.
User Stories
- Breakdown of User-Stories
- User Story #1
- User Story #2
- User Story #3
Design
Cover details on assumptions made, design alternatives considered, high level design
Approach
Approach #1
Approach #2
API changes
New Programmatic APIs
New Java APIs introduced (both user facing and internal)
Deprecated Programmatic APIs
New REST APIs
Path | Method | Description | Response Code | Response |
---|---|---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application | 200 - On success 404 - When application is not available 500 - Any internal errors |
|
Deprecated REST API
Path | Method | Description |
---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application |
CLI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
UI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
Security Impact
What's the impact on Authorization and how does the design take care of this aspect
Impact on Infrastructure Outages
System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
Releases
Release X.Y.Z
Release X.Y.Z
Related Work
- Work #1
- Work #2
- Work #3