Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Hydrator Plugin Experience

...

Installing custom plugins

In CDAP 3.2, installing Hydrator plugins requires uploading the artifact containing the plugins, placing a .json the UI uses to configure widgets into a special folder on the UI machine, and adding docs to cdap's website.

This is problematic for a few reasons:

  1. Too complex.  There are multiple steps required, and they need to happen to multiple systems (against cdap and cdap-ui).

  2. The same UI configs are shared across all plugin artifacts, which is incorrect. Artifacts are versioned, and different versions may require different configs.  Additionally, artifacts with the same name and version may still be different if they are in different namespaces

  3. No way to add docs since they are on the Cask hosted docs page

The problem is that widgets and docs don't belong in CDAP, but they also don't really belong in the UI either. The UI can live on multiple machines, and also don't want to force operations on the filesystem when a user is simply installing a new set of plugins.  Long term, what we really would want is a Hydrator backend, which is an app that manages pipelines, plugins, etc.  In the shorter term, it would be desirable if there were some generic CDAP feature that could allow Hydrator to use CDAP to serve the widget and doc information.

 

The plan then, is to add properties to artifacts. Hydrator can store the widgets and docs for each plugin in the properties of the artifact containing that plugin. You can already set properties on streams and datasets, this would be an analogous feature. This solves the 3 problems mentioned above, while keeping CDAP ignorant of widgets and UI specific things.

Code Block
GET /namespaces/{namespace-id}/artifacts/{artifact-name}/versions/{artifact-version}/properties
GET /namespaces/{namespace-id}/artifacts/{artifact-name}/versions/{artifact-version}/properties/{property}?keys=key1,key2
PUT /namespaces/{namespace-id}/artifacts/{artifact-name}/versions/{artifact-version}/properties/{property}
PUT /namespaces/{namespace-id}/artifacts/{artifact-name}/versions/{artifact-version}/properties
DELETE /namespaces/{namespace-id}/artifacts/{artifact-name}/versions/{artifact-version}/properties/{property}

Installing a set of plugins could then be a single command:

Code Block
load artifact <path/to/artifact> [config-file <config-file>] 

with properties added to the config file, and something of the form:

Code Block
{
  "properties": {
    "widgets.batchsource.database": "<widgets json>",
    "doc.batchsource.database": "<doc>",
    ...
  }
}

 

Jira Legacy
serverCask Community Issue Tracker
serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
keyCDAP-4280

Installing plugins from hydrator-plugins repository

To install the plugins from hydrator-plugins, a user simply installs the hydrator-plugins rpm or deb.  In addition, cdap-site.xml will have to set the app.artifact.dir setting to include the directory for those plugins (/opt/hydrator-plugins/artifacts).

 

The work for this will include setting up the hydrator-plugins build to create the rpms and debs.  The pom will be changed so that in addition to the jars, the config json will be created.  The json files will be created using an antrun script target that writes out the parents for the artifact, as well as properties for widget and doc information.  Prepare phases can then copy jars and jsons to the right place for packaging to be done.

 

The CDAP build will still need to bundle the results of the hydrator-plugins build in order to build a standalone zip that contains hydrator plugins. To accomplish this, we will add a property ('-Dadditional.artifacts.dir=</path/to/additional/artifacts>') that can be used to copy any additional jars and jsons from some external directory. This property can then be used in standalone builds to pull in the hydrator-plugins artifacts and configs.

Plugin Selection

User installs CDAP 3.3.0 by downloading the sdk or installing via packages.  Included in CDAP is the Hydrator webapp, cdap-etl-batch-3.3.0, and cdap-etl-realtime-3.3.0 artifacts.  Also included are all the v1.0.0 plugins in the hydrator-plugins repo, like cassandra-1.0.0.  The user creates a pipeline that reads from cassandra and writes to a Table.

...

Code Block
{
  "connections": [
    { 
      "from": "twitter source",
      "to": "language tagger"
    },
    {
      "from": "language tagger",
      "selector": {
        //switch based on the value of the "typelang": "fieldvalue",field
        "field": "lang",
        "switch": {
          //output stage -> field value
          "categorizer": "en"
        },
        // optional. if doesn't match anything in outputs, go here.
        // if absent, record is dropped
        "default": "translator"
      }
    },
    {
      "from": "translator",
      "selectorfrom": {"translator",
        "typeselector": "fieldvalue",{
        "field": "lang",
        "switch": {
          "categorizer": "en"
        },
        "default": "invalid tweets table"
      }
    },
    {
      "from": "categorizer",
      "selector": {
        "type": "fieldvalue",// switch based on the value returned by the script
        "script": "function (input) {
          return input.spam;
        }",
        "switch": {
          "categorized tweets table": true,
          "invalid tweets table": false
        }
      }
    }
  ]
}

...

Code Block
@Override
public void configurePipeline(PipelineConfigurer pipelineConfigurer) throws IllegalArgumentException {
  StageConfigurer stageConfigurer = pipelineConfigurer.getStageConfigurer();
  for (Schema inputSchema :
stageConfigurer.getInputSchemas()) {     // perform validation
 
  if (!isValid(inputSchemastageConfigurer.getInputSchema())) {
      throw new IllegalArgumentException("reason")
    }
    stageConfigurer.addOutputSchemasetOutputSchema(getOutputSchema(inputSchema));
  }
}

If a plugin does not know what it's output schema will be, or if the output schema is not constant, it will return null. Plugins further in the pipeline will then get null as an input schema.

...