Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
CDAP pipeline is composed of various CDAP plugins. These plugins handle error situations in case of invalid inputs or configurations. While developing CDAP pipelines, pipeline developer can provide invalid plugin configurations. For example, the BigQuery sink plugin can have schema which does not match with an underlying BigQuery table. In such situations, to fail fast, pipeline plugins should provide all the error messages at once.
Goals
To fail fast, introduce a new api to collect multiple error messages from plugins at configure time
Instrument plugins to use this api to return multiple error messages for validation endpoint
User Stories
- As a CDAP pipeline developer, if a pipeline contains plugin configurations which are invalid, I will like it to fail early with appropriate error message.
API Changes for Plugin Validation
Plugin Validation
Plugin validation endpoint is used to surface all the stage level errors at once. To collect multiple stage validation errors from the stage, StageConfigurer, MultiInputStageConfigurer and MultiOutputStageConfigurer can be modified as below. If there are one or more errors added to stage configurer, the pipeline deployment will fail.
public interface StageConfigurer { /** * get the input schema for this stage, or null if its unknown * * @return input schema */ @Nullable Schema getInputSchema(); /** * set the output schema for this stage, or null if its unknown * * @param outputSchema output schema for this stage */ void setOutputSchema(@Nullable Schema outputSchema); /** * set the error schema for this stage, or null if its unknown. * If no error schema is set, it will default to the input schema for the stage. Note that since source * plugins do not have an input schema, it will default to null for sources. * * @param errorSchema error schema for this stage */ void setErrorSchema(@Nullable Schema errorSchema); /** * add errors for this stage to the configurer if pipeline stage is invalid. * * @param error {@link InvalidStageException} when a pipeline stage is invalid for any reason. */ void addStageError(InvalidStageException error); }
Plugins can use this api as below:
@Override public void configurePipeline(PipelineConfigurer pipelineConfigurer) { pipelineConfigurer.createDataset(conf.destinationFileset, FileSet.class); StageConfigurer stageConfigurer = pipelineConfigurer.getStageConfigurer(); try { Pattern.compile(conf.filterRegex); } catch (Exception e) { stageConfigurer.addStageError(new InvalidConfigPropertyException(e.getMessage(), "filterRegex")); } if (conf.sourceFileset.equals(conf.destinationFileset)) { stageConfigurer.addStageError(new InvalidStageException("source and destination filesets must be different")); } }
Sources and sinks can have schema mismatch with underlying storage. A bew type of exception can be introduced so that invalid schema fields can be highlighted when schema mismatch occurs:
public class InvalidSchemaFieldException extends InvalidStageException { private final String field; public InvalidSchemaFieldException(String message, String field) { super(message); this.field = field; } public InvalidSchemaFieldException(String message, Throwable cause, String field) { super(message, cause); this.field = field; } public String getField() { return field; } }
Validation error will have corresponding INVALID_SCHEMA type for UI to identify schema field errors.
public class ValidationError { protected final Type type; protected final String message; /** * Types of validation errors */ public enum Type { ERROR, STAGE_ERROR, INVALID_FIELD, PLUGIN_NOT_FOUND, INVALID_SCHEMA } ... }
/** * An error that occurred due to field schema mismatch in a specific pipeline stage. */ public class InvalidSchemaFieldError extends StageValidationError { private final String field; public InvalidSchemaFieldError(String stage, InvalidSchemaFieldException cause) { this(cause.getMessage(), stage, cause.getField()); } public InvalidConfigPropertyError(String message, String stage, String field) { super(Type.INVALID_SCHEMA, message, stage); this.field = field; } public String getField() { return field; } @Override public boolean equals(Object o) { .... } @Override public int hashCode() { ... } }
Impact on UI
UI changes will be needed for invalid schema type errors returned from validation endpoint.
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
Releases
Release 6.1.0
Related Work
- Work #1
- Work #2
- Work #3