Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Names should be the names shown in the UI and not the names used in the backend.
  • Do not mention the format that the backend expects if the UI does not expose the format. For example, do not mention that 'fields' is a comma separated list of fields if the widget is using the 'csv' widget.
  • The first "sentence" is a fragment. You can think of it starting with an implicit "This property is ".
  • Always end the description with a period.
  • Mention restrictions or special values for properties. For example, document that a timeout property cannot be below 0 and that 0 means there is no timeout.
  • For numeric properties, include the unit. For example, instead of 'timestamp', use 'timestamp in seconds'. Instead of 'size', use 'size (GB)'.
  • Reference Name should always have the same description for sources and the same description for sinks – "Used to uniquely identify this <source/sink> for lineage, annotating metadata, and other governance operations."


For example:

No Format
Properties
----------
**Reference Name:** Used to uniquely identify this sink for lineage, annotating metadata, etc and other governance operations.

**Project ID**: The Google Cloud Project ID, which uniquely identifies a project.
It can be found on the Dashboard in the Google Cloud Platform Console.

**Service Account File Path**: Path on the local file system of the service account key used for
authorization. Does not need to be specified when running on a Dataproc cluster.
When running on other clusters, the file must be present on every node in the cluster.

...

Labels should be capitalized, with the exception of 'a', 'an', 'of', 'the'.

If a property does not have to be a text box, it probably should not be a textbox.

Most widgets should specify a placeholder:

No Format

  {
    "widget-type": "textbox",
    "label": "Bucket Name",
    "name": "bucket",
    "widget-attributes" : {
      "placeholder": "The bucket to be used to create directories."
    }
  }

In most cases, the first sentence in the documentation for that property is good to use.

Naming

Coming up with a good name for a plugin and its properties can be a difficult task. The user facing name for plugins and properties is the 'label' specified in the widget json. 

Validation

...

Here are some guidelines:

  • Don't put the plugin type in the name. For example, instead of 'Table Source', just use 'Table'.
  • Use 'partition' instead of 'split' or 'shard'. These all mean the same thing, but we just need a standard
  • 'Reference Name' is the standard name for external datasets
  • Use a positive name for boolean properties. For example, 'Enable Auto Commit' instead of 'Disable Auto Commit'. 

Validation

User input should be validated as early as possible. Anything static should be validated in the configurePipeline() method. Everything else should be validated in the prepareRun() method.

One convention many plugins have used is to have a validate() method on their plugin config object. Sometimes it will take an input schema as an argument. That way the validate() method can easily be called at configure time and at prepare time. It also gives a way to quickly unit test the validation logic without needing to deploy pipelines.

Some useful things to keep in mind while validating:

  • Use the containsMacro() method to check if the property is ready to be validated
  • If a property is invalid, throw an IllegalArgumentException with a user friendly message. This message will be shown in the UI and should mention which property is invalid, why it is invalid, and what action the user can take to make it valid.
  • For numeric properties, can the property be 0? Can it be negative? Does it need to be within a certain range?
  • The input schema is often required to perform validation. For example, a plugin may operate on a specific field, which can only be a specific type.
  • A property cannot be null unless it is annotated as @Nullable.
  • Don't forget to handle empty strings.