Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction


CDAP pipeline is composed of various plugins that can be configured by users as CDAP pipelines are being developed. While building CDAP pipelines, pipeline developer can provide invalid plugin configurations or schema. For example, the BigQuery sink plugin can have output schema which does not match with underlying BigQuery table Or provided bucket name contains invalid characters. CDAP pipeline developer can use new validation endpoint to validate each stage before deploying the pipeline. In order to fail fast and for better user experience, validation endpoint should return all the possible validation errors from a given stage when this endpoint is called. 

Goals

The purpose of this document is to provide general guidelines on usage of new validation apis in cdap plugins.

Validation Api usage in plugins

The purpose of using validation apis in plugins is to collect validation errors as early as possible. FailureCollector api is to be used to collect multiple ValidationFailures.

Failure collection using FailureCollector

CDAP plugins override method configurePipeline() which is used to configure the stage at deploy time. The same method is called through validation endpoint as well. In order to collect multiple validation failures, FailureCollector api is exposed through stage configurer to validate the configurations in this method. The sample usage of FailureCollector api looks as below:

@Override
public void configurePipeline(PipelineConfigurer configurer) {
   StageConfigurer stageConfigurer = configurer.getStageConfigurer();
   // get failure collector from stage configurer
   FailureCollector collector = stageConfigurer.getFailureCollector();
   // use failure collector to collect multiple validation failures
   config.validate(collector); 
   validatePartitionProperties(collector); 
   validateConfiguredSchema(configuredSchema, collector);
    stageConfigurer.setOutputSchema(configuredSchema);
}

Adding ValidationFailures to FailureCollector

A validation failure made up of 3 components: 

  • Message - Represents a validation error message
  • Corrective action - An optional corrective action that represents an action to be taken by the user to correct the error situation
  • Causes - Represents one or more causes for the validation failure. Each cause can have more than one attributes. These attributes are used to highlight different sections of the plugins on UI.

Example:

In bigquery source if the bucket config contains invalid characters, a new validation failure will be added to the collector with a `stageConfig` cause attribute as below:

Pattern p = Pattern.compile("[a-z0-9._-]+");
if (!p.matcher(bucket).matches()) {
   collector.addFailure("Allowed characters are lowercase characters, numbers,'.', '_', and '-'", 
                        "Bucket name should only contain allowed characters.'")
                        .withConfigProperty("bucket");
}


While a ValidationFailure allows plugins to add a cause with any arbitrary attributes, ValidationFailure api provides various util methods to create validation failures with common causes that can be used to highlight appropriate UI sections. Below is the list of common causes and associated plugin usage:

1. Stage config cause

Purpose: Indicates an error in the stage property

Scenario: User has provided in valid bucket name for Bigquery source plugin

Example: 

collector.addFailure("Allowed characters are lowercase characters, numbers,'.', '_', and '-'", 
                     "Bucket name should only contain allowed characters.'")
                     .withConfigProperty("bucket");


2. Plugin not found cause

Purpose: Indicates a plugin not found error

Scenario: User is trying to use a plugin/jdbc driver that has not been deployed 

Example:

collector.addFailure("Unable to load JDBC driver class 'com.mysql.jdbc.Driver'.",
                     "Jar with JDBC driver class 'com.mysql.jdbc.Driver' must be deployed")
                     .withPluginNotFound("driver", "mysql", "jdbc");


3. Config element cause

Purpose: Indicates a single element in the list of values for a given config property

Scenario: User has provided a field to keep in the project transform that does not exist in input schema

Example: 

collector.addFailure("Field to keep 'non_existing_field' does not exist in the input schema",
                     "Field to keep must be present in the input schema")
                     .withConfigElement("keep", "non_existing_field");


4. Input schema field  cause

Purpose: Indicates an error in input schema field

Scenario: User is using big query sink plugin that is does not record fields

Example:

collector.addFailure("Input field 'record_field' is of unsupported type.",
                     "Field 'record_field' must be of primitive type.")
                     .withInputSchemaField("record_field", null);


5. Output schema field cause

Purpose: Indicates an error in output schema field

Scenario: User has provided output schema field that does not exist in big query source table

Example:

collector.addFailure("Output field 'non_existing' does not exist in table 'xyz'.",
                     "Field 'non_existing' must be present in table 'xyz'.")
                     .withOutputSchemaField("non_existing", null);


Cause Associations

While validating the plugin configurations, the validation failure can be caused by multiple causes. Below are a few examples of associated causes:

Example 1

Database source has username and password as co-dependent properties. If username is not provided but password is provided, the plugin can just add a new validation failure with 2 causes as below:

collector.addFailure("Missing username",
                     "Username and password must be provided'")
                     .withConfigProperty("username").withConfigProperty("password");


Example 2

Projection Transform received incompatible input schema and output schema for a field such that input field can not be converted to output field. In that case a new validation failure can be created with 2 different causes as below:

collector.addFailure("Input field 'record_type' can not be converted to string",
                     "Field 'record_type' must be of primitive type'")
                     .withConfigProperty("convert").withInputSchemaField("record_type");

Guidelines to incorporate new validation apis


Related Work

  • No labels