Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.


Table of Contents

Checklist

  •  User Stories Documented
  •  User Stories Reviewed
  •  Design Reviewed
  •  APIs reviewed
  •  Release priorities assigned
  •  Test cases reviewed
  •  Blog post

Introduction 

CDAP pipelines is composed of various CDAP plugins. 


CDAP pipeline is composed of various CDAP plugins. These plugins handle error situations in case of invalid inputs or configurations. While developing CDAP pipelines, pipeline developer can provide invalid plugin configurations. For example, the BigQuery sink plugin can have schema invalid temporary GCS file which does not match with an underlying BigQuery table. In such situations, providing clear error message is helpful to guide user in right direction. Wrangler provides interactive way for users to apply directives to the data. However, while applying these directives, user may run into error situations. For example, the input json file may be corrupted which can fail parse-as-json directive. In such error situations, user should be provided clear error message so that further actions can be taken.

Goals

There are four goals which needs to be achieved to improve error handling:

  • Have Provide a guideline on how an error message should be formulated that makes it easier for end user to interpret the error situation

  • Instrument plugins to return multiple error messages for validation endpoint
  • Add a framework to standardize error messages in wrangler and pipeline

  • Add a framework to prefix error codes to user facing error messages so that users developers can figure out the source of error message
  • Add a framework to standardize error messages in cdap, pipeline and wrangler
  • Instrument plugins to return multiple error messages for validation endpoint

Scope

Plugins

  • Plugin Validation (Has a separate design doc - this document focuses on design of error codes and standard error messages)
    • Provide a framework to collect multiple validation errors so that they can be highlighted by UI when validation endpoint is called.
    • Provide a framework to add new type of exception without replacing data pipeline artifacts
    • Instrument plugins so that all the invalid config and schema fields are reported to the user at once when a plugin is validated

Dataprep

  • Improve error messages in all Directives
    • Remove usages of object hashes in the error messages. It happens because of usage of toString() in error messages
    • Standardize error messages 
    • Apply error codes to user facing error messages

Pipeline

  • Standardize error messages
  • Apply error codes to user facing error messages

User Stories


  1. As a CDAP pipeline developer, if a pipeline contains plugin configurations which are invalid, I will like it to fail early with appropriate error message.

  2. As an ETL engineer, if I run into error situation while applying directives, I will like to see appropriate error message which clearly indicates the error.

Guidelines for Error Messages

An error message is the text used to provide information about error situations. Poorly written error messages can increase support costs and can be a source of frustration for users. Well-written error messages are very important for better user experience. Below is the guideline on writing better error messages. 

1. Error messages should be contextual.

Contextual error messages provide specific information particular to error situation. Error messages without any context are very hard to interpret by users. Contextual information may include information such as why the error happened, what is the error value, what is the expected value, how user can fix the error etc.

For example, if there is a mismatch in data type of a field, providing more contextual information to user in error message would help user understand the problem and fix it if needed. 

No Format
Error Message:
Data type mismatch for field 'X'. 

Better Error Message:
Field 'X' is expected to be of data type 'int'. However, provided data type is 'string'.

2. Do not include implementation details in user facing error messages.

Exposing implementation details to end users can be confusing and users may not be able to take any action to solve error situation. Thats why user facing error messages should not include implementation details. Below are some of the cases where we can avoid exposing implementation details:

  • Avoid using class hashes in error messages. For example:

    No Format
    Error Message:
    co.cask.directives.language.SetCharset@781ecbac : Invalid type 'java.lang.String' of column 'body'. Should be of type String.
    
    Better Error Message:
    Error executing 'set-charset' directive: The 'body' column was expected to be a byte array or ByteBuffer, but is of type 'String'. 
  • Avoid using exception class names in user facing error messages. For example:

    No Format
    Error Message:
    java.lang.IllegalArgumentException: Database driver 'cloudsql-postgresql' not found.
    
    Better Error Message:
    Database driver 'cloudsql-postgresql' not found. Please make sure correct database driver is deployed.
  • Avoid using technical implementation details in user facing error messages. For example:

    No Format
    Error Message:
    Failed to configure pipeline: valueOf operation on abc failed.
    
    Better Error Message:
    Failed to configure pipeline: Expected type of field 'X' is either int/double but found 'abc'.

3. Error message should provide direction to user if action is needed from user.

An error message has 3 parts, problem identification, cause details if helpful, and a solution if possible. Whenever error situation occurs, users would like to fix it immediately. The error message should have enough information to guide the user in right direction.

4. Provide complete concise error message to user and avoid ambiguity.

An error message should be a complete sentence which provides clear message. User should be able to understand the problem by reading the error message. For example:

No Format
Error Message:
io.cdap.directives.transformation.Decode@c2e00f5 : Failed to decode hex value.

Better Error Message:
Error while decoding field 'X' as hex value. Please make sure the provided field is encoded as hex.

5. Always prefer error message specific to the error situation instead of generic error messages

Whenever possible, use specific error message instead of generic error message. For example:

No Format
Error Message:
Failed to decode hex value.

Better Error Message:
Error while decoding field 'X' as hex value. Please make sure the provided field is encoded as hex.

Scope

Dataprep

  • Improve error messages in all Directives
    • Remove usages of object hashes in the error messages. It happens because of usage of toString() in error messages
    • Standardize error messages 
    • Apply error codes to user facing error messages

Plugins

  • Plugin Validation
    • Instrument plugins so that all the invalid config and schema fields are reported to the user at once when a plugin is validated

Pipeline

  • Standardize error messages
  • Apply error codes to user facing error messages

Platform

  • Standardize error messages
  • Apply error codes to user facing error messages

    Scenario 1: Error codes in Wrangler


    Scenario 2: Standard Error messages in Wrangler


    Scenario 3: Error codes in Pipeline

    Scenario 4: Standard Error messages in Pipeline

    Approach

    Impact on UI

    UI changes will be needed for invalid schema type errors returned from validation endpoint.

    Test Scenarios

    Test ID

    Test Description

    Expected Results













    Bug Fixes 

    • Jira Legacy
      serverCask Community Issue Tracker
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
      keyCDAP-14378

    • Jira Legacy
      serverCask Community Issue Tracker
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
      keyCDAP-15499

    • Jira Legacy
      serverCask Community Issue Tracker
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
      keyCDAP-15507

    • Jira Legacy
      serverCask Community Issue Tracker
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
      keyCDAP-15040

    • Jira Legacy
      serverCask Community Issue Tracker
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
      keyCDAP-15593

    • Jira Legacy
      serverCask Community Issue Tracker
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
      keyCDAP-14797

    • Jira Legacy
      serverCask Community Issue Tracker
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
      keyCDAP-15563

    • Jira Legacy
      serverCask Community Issue Tracker
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
      keyCDAP-15560

    • Jira Legacy
      serverCask Community Issue Tracker
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
      keyCDAP-11767

    • Jira Legacy
      serverCask Community Issue Tracker
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
      keyCDAP-15426

    • Jira Legacy
      serverCask Community Issue Tracker
      columnskey,summary,type,created,updated,due,assignee,reporter,priority,status,resolution
      serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
      keyCDAP-15581

    Error codes

    Error codes can be used to provide more information to the user about which part of the system is throwing the error message.

    Standardized error messages for internationalization

    Error messages should be standardized for CDAP, Pipeline and Wrangler for internationalization support. 

    Usecases

    Approach

    Impact on UI

    UI changes will be needed for invalid schema type errors returned from validation endpoint.

    Test Scenarios

    Test ID

    Test Description

    Expected Results

    Releases

    Release 6.1.0

    Related Work

    Future work

    • Add error code and standard error message capability to CDAP platform.