Table of Contents |
---|
Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
As the data pipeline experience has become more sophisticated, it has become clear that there are a good amount of capabilities required during the pipeline creation process. Some of these capabilities are provided by the CDAP platform if they are generic enough, such as the listing of plugins and their properties. However, there are a class of capabilities that are specific only to pipelines, such as pipeline validation, schema propagation, plugin templates, and pipeline drafts. These have all been implemented with non-trivial logic in the UI. This type of pipeline specific logic does not belong in the UI, but in a backend service that the UI can call. This has slowed UI development, contributed to bugs in the product, and added a lot of technical debt. By moving much of this logic to a system application, overall development speed will increase and the door will be opened to a richer set of features in the future.
Goals
Design the pipeline system application to remove tech debt from the UI and build an architecture that supports a richer set of pipeline specific product requirements in the future.
User Stories
- As a pipeline developer, I want to be able to validate a pipeline before deploying it and know exactly which stages and which fields are invalid and why
- As a pipeline developer, I want to be able to validate a pipeline stage and know exactly which fields are invalid and why
- As a pipeline developer, I want schema to update without needing to click on a button
- As a pipeline developer, I want to be able to debug a single pipeline stage by specifying input and examining output and/or errors
- As a pipeline developer, I want the schema displayed by the UI to always match what is used during execution
- As a pipeline developer, I want to be able to import a pipeline spec with missing artifacts and have the option to automatically update those versions
Design
Cover details on assumptions made, design alternatives considered, high level designA new system application will be introduced to provide much of the more complex logic that is currently handled by the UI. Where possible, APIs will be stateless.
Approach
Approach #1
Approach #2
API changes
New Programmatic APIs
New Java APIs introduced (both user facing and internal)None
Deprecated Programmatic APIs
None
New REST APIs
Path | Method | Description | Response Code | Response | /v3/apps/<app-id> | GET | Returns the application spec for a given application | 200 - On success 404 - When application is not available 500 - Any internal errors |
---|---|---|---|---|---|---|---|---|
Deprecated REST API
Path | Method | Description |
---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application |
None
CLI Impact or Changes
None
UI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
Security Impact
What's the impact on Authorization and how does the design take care of this aspect
Impact on Infrastructure Outages
System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspectUsers will not be able to use the pipeline studio if the new pipeline service is down.
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
Releases
Release X.Y.Z
Release X.Y.Z
Related Work
- Work #1
- Work #2
- Work #3