Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

CDAP pipeline is composed of various CDAP plugins. These plugins handle error situations in case of invalid inputs or configurations. While developing CDAP pipelines, pipeline developer can provide invalid plugin configurations. For example, the BigQuery sink plugin can have schema which does not match with an underlying BigQuery table. In such situations, to fail fast, pipeline plugins should provide all the error messages at once. 

Goals

  • To fail fast, introduce a new api to collect multiple error messages from plugins at configure time

  • Instrument plugins to use this api to return multiple error messages for validation endpoint

User Stories 

  • As a CDAP pipeline developer, if a pipeline contains plugin configurations which are invalid, I will like it to fail early with appropriate error message.

API Changes for Plugin Validation

Plugin Validation

Plugin validation endpoint is used to surface all the stage level errors at once. To collect multiple stage validation errors from the stage, StageConfigurer, MultiInputStageConfigurer and MultiOutputStageConfigurer can be modified as below. If there are one or more errors added to stage configurer, the pipeline deployment will fail.

StageConfigurer.java
public interface StageConfigurer {

  /**
   * get the input schema for this stage, or null if its unknown
   *
   * @return input schema
   */
  @Nullable
  Schema getInputSchema();

  /**
   * set the output schema for this stage, or null if its unknown
   *
   * @param outputSchema output schema for this stage
   */
  void setOutputSchema(@Nullable Schema outputSchema);

  /**
   * set the error schema for this stage, or null if its unknown.
   * If no error schema is set, it will default to the input schema for the stage. Note that since source
   * plugins do not have an input schema, it will default to null for sources.
   *
   * @param errorSchema error schema for this stage
   */
  void setErrorSchema(@Nullable Schema errorSchema);

  /**
   * add errors for this stage to the configurer if pipeline stage is invalid. 
   *
   * @param error {@link InvalidStageException} when a pipeline stage is invalid for any reason.
   */
  void addStageError(InvalidStageException error);
}


Plugins can use this api as below:

@Override
public void configurePipeline(PipelineConfigurer pipelineConfigurer) {
  pipelineConfigurer.createDataset(conf.destinationFileset, FileSet.class);
  StageConfigurer stageConfigurer = pipelineConfigurer.getStageConfigurer();
  try {
    Pattern.compile(conf.filterRegex);
  } catch (Exception e) {  
    stageConfigurer.addStageError(new InvalidConfigPropertyException(e.getMessage(), "filterRegex"));
  }
  if (conf.sourceFileset.equals(conf.destinationFileset)) {
    stageConfigurer.addStageError(new InvalidStageException("source and destination filesets must be different"));
  }
}


Sources and sinks can have schema mismatch with underlying storage. A bew type of exception can be introduced so that invalid schema fields can be highlighted when schema mismatch occurs:

InvalidSchemaFieldException.java
public class InvalidSchemaFieldException extends InvalidStageException {
  private final String field;

  public InvalidSchemaFieldException(String message, String field) {
    super(message);
    this.field = field;
  }

  public InvalidSchemaFieldException(String message, Throwable cause, String field) {
    super(message, cause);
    this.field = field;
  }

  public String getField() {
    return field;
  }
}


Validation error will have corresponding INVALID_SCHEMA type for UI to identify schema field errors.

ValidationError.java
public class ValidationError {
  protected final Type type;
  protected final String message;

  /**
   * Types of validation errors
   */
  public enum Type {
    ERROR,
    STAGE_ERROR,
    INVALID_FIELD,
    PLUGIN_NOT_FOUND,
    INVALID_SCHEMA
  }

  ...
}
InvalidSchemaFieldError.java
/**
 * An error that occurred due to field schema mismatch in a specific pipeline stage.
 */
public class InvalidSchemaFieldError extends StageValidationError {
  private final String field;

  public InvalidSchemaFieldError(String stage, InvalidSchemaFieldException cause) {
    this(cause.getMessage(), stage, cause.getField());
  }

  public InvalidConfigPropertyError(String message, String stage, String field) {
    super(Type.INVALID_SCHEMA, message, stage);
    this.field = field;
  }

  public String getField() {
    return field;
  }

  @Override
  public boolean equals(Object o) {
    ....
  }

  @Override
  public int hashCode() {
    ...
  }
}


Impact on UI

UI changes will be needed for invalid schema type errors returned from validation endpoint.

Test Scenarios

Test IDTest DescriptionExpected Results












Releases

Release 6.1.0

Related Work

  • Work #1
  • Work #2
  • Work #3

Future work

  • No labels