Table of Contents |
---|
Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
CDAP pipelines is composed of various CDAP plugins. CDAP pipeline is composed of various CDAP plugins. These plugins handle error situations in case of invalid inputs or configurations. While developing CDAP pipelines, pipeline developer can provide invalid plugin configurations. For example, the BigQuery sink plugin can have invalid temporary GCS file which does not match with an underlying BigQuery table. In such situations, providing clear error message is helpful to guide user in right direction. Wrangler provides interactive way for users to apply directives to the data. However, while applying these directives, user may run into error situations. For example, the input json file may be corrupted which can fail parse-as-json directive. In such error situations, user should be provided clear error message so that further actions can be taken.Data pipeline and Wrangler
Goals
There are four goals which needs to be achieved to improve error handling:
Provide a guideline on how an error message should be formulated that makes it easier for end user to interpret the error situation
- Instrument plugins to return multiple error messages for validation endpoint
Add a framework to standardize centralize error messages in wrangler and pipeline
- Add a framework to prefix add error codes to user facing error messages so that developers can figure out the source of error message
Scope
Plugins
- Plugin Validation (Has a separate design doc - this document focuses on design of error codes and standard error messages)
- Provide a framework to collect multiple validation errors so that they can be highlighted by UI when validation endpoint is called.
- Provide a framework to add new type of exception without replacing data pipeline artifacts
- Instrument plugins so that all the invalid config and schema fields are reported to the user at once when a plugin is validated
Dataprep
- Improve error messages in all Directives
- Remove usages of object hashes in the error messages. It happens because of usage of toString() in error messages
- Standardize error messages
- Apply error codes to user facing error messages
Pipeline
- Standardize error messages
- Apply error codes to user facing error messages
- without looking at stacktrace
User Stories
As a CDAP pipeline developer, if a pipeline contains plugin configurations which are invalid, I will like it to fail early with appropriate error message.
As an ETL engineer, if I run into error situation while applying directives, I will like to see appropriate error message which clearly indicates the error.
Scenarios
Scenario 1: Error codes in Wrangler
Scenario 1.1
Alice wants to wrangle data using CDAP's Wrangler tool. As part of that, Alice wants to connect to Database Source using Wrangler Connection. While attempting to do that, Alice is seeing a cryptic error message while testing the connection that she does not know how to resolve just by looking at the error message. In order figure out cause of the issue and recommended action, Alice will like to browse error code catalog with the error code displayed along with the error message.
Scenario 1.2
Alice is applying transformations to the data on the fly using CDAP's Wrangler tool. However, while applying transformations, Alice sees an error message that does not suggest recommended action to fix the issue. She wants to browse the error code catalog to figure out recommended action to resolve the issue.
Scenario 2: Centralized Error messages in Wrangler
Scenario 2.1
Alice wants to wrangle data using CDAP's Wrangler tool. While applying transformations to the connected source data on the fly, Alice tries to parse boolean column as csv. However, Alice observes that built-in directives to parse boolean data as csv and avro returns different error messages. Alice would like to see standard error message from both the directives when column type is incompatible.
Scenario 3: Error codes in Pipeline
Scenario 3.1
As a plugin developer, Bob has deployed a plugin to CDAP. Alice is a pipeline developer who is building a pipeline using CDAP data pipeline studio. While building the pipeline, built data pipeline using GUI of CDAP pipeline studio. After deploying the data pipeline, the pipeline failed with NPE without any meaningful error message. To debug this issue further, Alice wants to share error code, rather than whole stack trace, with Bob to figure out which method caused NPE exception.
Scenario 4:
Centralized Error messages in Pipeline
Scenario 4.1
Alice who is a data pipeline developer, develops data pipelines using CDAP. While building data pipelines, Alice observes that the error messages for common error types such as invalid configuration error are not consistent. Alice would like to see consistent error messages for common errors.
Design
Approach
Impact on UI UI
changes will be needed for invalid schema type errors returned from validation endpoint.Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
Bug Fixes
Jira Legacy server Cask Community Issue Tracker columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-14378 Jira Legacy server Cask Community Issue Tracker columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-15499 Jira Legacy server Cask Community Issue Tracker columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-15507 Jira Legacy server Cask Community Issue Tracker columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-15040 Jira Legacy server Cask Community Issue Tracker columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-15593 Jira Legacy server Cask Community Issue Tracker columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-14797 Jira Legacy server Cask Community Issue Tracker columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-15563 Jira Legacy server Cask Community Issue Tracker columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-15560 Jira Legacy server Cask Community Issue Tracker columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-11767 Jira Legacy server Cask Community Issue Tracker columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-15426 Jira Legacy server Cask Community Issue Tracker columns key,summary,type,created,updated,due,assignee,reporter,priority,status,resolution serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-15581
Releases
Release 6.1.0