Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
CDAP pipeline is composed of various plugins that can be configured by users as CDAP pipelines are being developed. While building CDAP pipelines, pipeline developer can provide invalid plugin configurations or schema. For example, the BigQuery sink plugin can have output schema which does not match with underlying BigQuery table Or provided bucket name contains invalid characters. CDAP pipeline developer can use new validation endpoint to validate each stage before deploying the pipeline. In order to fail fast and for better user experience, validation endpoint should return all the possible validation errors from a given stage when this endpoint is called.
Goals
The purpose of this document is to provide general guidelines on usage of new validation apis in cdap plugins.
Validation Api usage in plugins
The purpose of using validation apis in plugins is to collect validation errors as early as possible. FailureCollector api is to be used to collect multiple ValidationFailures.
Failure collection using FailureCollector
CDAP plugins override method configurePipeline() which is used to configure the stage at deploy time. The same method is called through validation endpoint as well. In order to collect multiple validation failures, FailureCollector api is exposed through stage configurer to validate the configurations in this method. The sample usage of FailureCollector api looks as below:
@Override public void configurePipeline(PipelineConfigurer configurer) { StageConfigurer stageConfigurer = configurer.getStageConfigurer(); // get failure collector from stage configurer FailureCollector collector = stageConfigurer.getFailureCollector(); // use failure collector to collect multiple validation failures config.validate(collector); validatePartitionProperties(collector); validateConfiguredSchema(configuredSchema, collector); stageConfigurer.setOutputSchema(configuredSchema); }
Adding ValidationFailures to FailureCollector
A validation failure made up of 3 components:
- Message - Represents a validation error message
- Corrective action - An optional corrective action that represents an action to be taken by the user to correct the error situation
- Causes - Represents one or more causes for the validation failure. Each cause can have more than one attributes. These attributes are used to highlight different sections of the plugins on UI.
Example:
In bigquery source if the bucket config contains invalid characters, a new validation failure will be added to the collector with a `stageConfig` cause attribute as below:
Pattern p = Pattern.compile("[a-z0-9._-]+"); if (!p.matcher(bucket).matches()) { collector.addFailure("Allowed characters are lowercase characters, numbers,'.', '_', and '-'", "Bucket name should only contain allowed characters.'") .withConfigProperty("bucket"); }
While a ValidationFailure allows plugins to add a cause with any arbitrary attributes, ValidationFailure api provides various util methods to create validation failures with common causes that can be used to highlight appropriate UI sections. Below is the list of common causes and associated plugin usage:
1. Stage config cause
Purpose: Indicates an error in the stage property
Scenario: User has provided in valid bucket name for Bigquery source plugin
Example:
collector.addFailure("Allowed characters are lowercase characters, numbers,'.', '_', and '-'", "Bucket name should only contain allowed characters.'") .withConfigProperty("bucket");
2. Plugin not found cause
Purpose: Indicates a plugin not found error
Scenario: User is trying to use a plugin/jdbc driver that has not been deployed
Example:
collector.addFailure("Unable to load JDBC driver class 'com.mysql.jdbc.Driver'.", "Jar with JDBC driver class 'com.mysql.jdbc.Driver' must be deployed") .withPluginNotFound("driver", "mysql", "jdbc");
3. Config element cause
Purpose: Indicates a single element in the list of values for a given config property
Scenario: User has provided a field to keep in the project transform that does not exist in input schema
Example:
collector.addFailure("Field to keep 'non_existing_field' does not exist in the input schema", "Field to keep must be present in the input schema") .withConfigElement("keep", "non_existing_field");
4. Input schema field cause
Purpose: Indicates an error in input schema field
Scenario: User is using big query sink plugin that is does not record fields
Example:
collector.addFailure("Input field 'record_field' is of unsupported type.", "Field 'record_field' must be of primitive type.") .withInputSchemaField("record_field", null);
5. Output schema field cause
Purpose: Indicates an error in output schema field
Scenario: User has provided output schema field that does not exist in big query source table
Example:
collector.addFailure("Output field 'non_existing' does not exist in table 'xyz'.", "Field 'non_existing' must be present in table 'xyz'.") .withOutputSchemaField("non_existing", null);
Cause Associations
While validating the plugin configurations, the validation failure can be caused by multiple causes. Below are a few examples of associated causes:
Example 1
Database source has username and password as co-dependent properties. If username is not provided but password is provided, the plugin can just add a new validation failure with 2 causes as below:
collector.addFailure("Missing username", "Username and password must be provided'") .withConfigProperty("username").withConfigProperty("password");
Example 2
Projection Transform received incompatible input schema and output schema for a field such that input field can not be converted to output field. In that case a new validation failure can be created with 2 different causes as below:
collector.addFailure("Input field 'record_type' can not be converted to string", "Field 'record_type' must be of primitive type'") .withConfigProperty("convert").withInputSchemaField("record_type");
Guidelines to incorporate new validation apis
Related Work
- Validation Api Design - https://wiki.cask.co/display/CE/Plugin+Validation