Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

There will be a set of marketplace APIs that the UI will use to get categories and packages. In the first version of the market, the APIs will simply be static content served from S3. This essentially amounts to placing packages in a pre-determined directory structure, and generating index files that will be used to get metadata about the packages, such as the list of all packages, list of packages in a category, versions of a package, etc. a file that lists all packages in the repository.

 

There will be an internal process to push the entire market repository to S3. If a user wishes to host their own marketplace, they can do so using their own S3 instance or by sticking a server like (apache httpd for example) on top of a local directory structure.

...

The APIs are simply a contract about the directory structure of the marketplace. All APIs are relative to a base path. For example, cask.co/marketplace/v1. The structure is expected to be:

Code Block
GET
<base>/<cdap-version>/categories.json
<base>/<cdap-version>/v1/packages.json
<base>/<cdap-version>v1/packages/<package-name>/<version>/icon.jpgpng
<base>/<cdap-version>v1/packages/<package-name>/<version>/license.txt
<base>/<cdap-version>v1/packages/<package-name>/<version>/spec.json
<base>/<cdap-version>v1/packages/<package-name>/<version>/spec.json.asc
<base>/<cdap-version>v1/packages/<package-name>/<version>/archive.zip
<base>/<cdap-version>v1/packages/<package-name>/<version>/archive.zip.asc

The packages.json and signature files could be generated from all the package spec.json files using a tool.

List

...

all Packages

Code Block
GET /v1/<cdap-version>/categoriespackages.json
ex: /4.0.0/categoriesGET /v1/packages.json
[
  {
    "name": "examplesPurchaseExample",
    "label": "ExamplesPurchase History",
    "description": "Example applicationsApplication todemonstrating getusage startedof with CDAP."
  }flows, workflows, mapreduce, and services.",
  {     "nameauthor": "use-casesCask",
    "labelorg": "Use CasesCask Data Inc.",
    "descriptionversion": "Common Use Cases."4.0.1",
  },   ...
]

List all Packages

Code Block
GET /<cdap-version>/packages.json
ex: /"categories": [ "examples" ],
    "cdapVersion": "[4.0.0/packages.json
[,4.1.0)
  },
  {
    "name": "PurchaseExampleHelloWorld",
    "label": "PurchaseHello HistoryWorld",
    "description": "ExampleSimple Applicationapplication demonstrating usage of flows, workflows, mapreduce, and services.",
    "author": "Cask",
    "org": "Cask Data Inc.",
    "version": "4.0.10",
    "categories": [ "examples" ]
  },

 {     "namecdapVersion": "HelloWorld",
    "label": "Hello World",
    "description": "Simple application demonstrating usage of flows and services.",
    "author": "Cask",
    "org": "Cask Data Inc.",
    "version": "4.0.0",
    "categories": [ "examples" ]
  },
  ...
]
Note
This leaves grouping by category up to the UI.
[4.0.0,4.1.0)"
  },
  ...
]

This list is not expected to change often. It can be cached by the UI if needed. The 'cdapVersion' specifies which versions of cdap the package is compatible with. If none is given, it is compatible with all versions.

Note

This leaves grouping by category up to the UI. If needed, we could perhaps add packages-<category>.json files that only list the packages in a specific category.

This also leaves display of multiple versions of the same package up to the UI. Though it seems like most of the time we would only have one version of the package per cdap version so maybe it's not a big problem.

This also leaves filtering of packages incompatible with the cdap instance up to the UI.

Get Package Archive

Code Block
GET /<cdap-version>v1/packages/<package-name>/<version>/archive.zip
ex: GET /4.0.0v1/packages/PurchaseExample/4.0.1/archive.zip
[ binary archive contents] 

Get Package Archive Signature

Code Block
GET /<cdap-version>v1/packages/<package-name>/<version>/archive.zip.asc
ex: GET /4.0.0v1/packages/PurchaseExample/4.0.1/archive.zip.asc
[ archive signature ] 

Get Package Spec

Code Block
GET /<cdap-version>v1/packages/<package-name>/<version>/spec.json
ex: GET /4.0.0v1/packages/PurchaseExample/4.0.0/spec.json
{
  "spec-versionspecVersion": "1.0",
  "name": "PurchaseExample",
  "label": "Purchase History",
  "description": "Example Application demonstrating usage of flows, workflows, mapreduce, and services.",
  "author": "Cask",
  "org": "Cask Data Inc.",
  "version": "4.0.0",
  "created": 1234567899,
  "cdapVersion": "[4.0.0,4.1.0)",
  "changelog": "fixed a small parsing bug",
  "categories": [ "examples" ],
  "dependencies": { },
  "actions": [
    {
      "type": "create_artifact",
      "arguments": [
        {
          "name": "name",
          "value": "PurchaseHistoryExample"
        },
        {
          "name": "version",
          "value": "4.0.1"
        },
        {
          "name": "scope",
          "value": "user"
        },
        {
          "name": "jar",
          "value": "PurchaseHistoryExample-4.0.1.jar"
        }
      ]
    },
    {
      "type": "create_app",
      "arguments": [
        {
          "name": "name",
          "default": "PurchaseHistory"
        }
      ]
    }
  ]
}

Get Package Spec Signature

Code Block
GET /<cdap-version>v1/packages/<package-name>/<version>/spec.asc
ex: GET /4.0.0v1/packages/PurchaseExample/4.0.0/spec.asc
[ spec signature ]

Get Package Icon

Code Block
GET /<cdap-version>v1/packages/<package-name>/<version>/icon.jpgpng
ex: GET /4.0.0v1/packages/PurchaseExample/4.0.0/icon.jpgpng
[ icon bytes ]

Get Package License

Code Block
GET /<cdap-version>v1/packages/<package-name>/<version>/license.txt
ex: GET /4.0.0v1/packages/PurchaseExample/4.0.0/license.txt
Copyright © 2014-2016 Cask Data, Inc.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
       http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
...

...

Since people will be able to download code from the marketplace, it is especially important that there is protection against malicious code. We can make use of PGP in order to sign both the package archive and the package spec that are downloadable from the marketplace. The Market UI will have to be configured to use a GPG key (for the public CDAP marketplace, we could re-use the GPG key used for CDAP rpms and debians or create another one). It can then use that public key along with the signature APIs to verify that the spec and archive were signed by the owner of the package.

Package Spec

The package There will also be a setting that lets people turn off signature checking in case its not needed for internally hosted repositories.

Package Spec

The package spec contains some metadata about the spec itself, and a list of steps to perform on the CDAP instance. It is a JSON file of the following structure:

Code Block
{
  "spec-versionspecVersion": "1.0"
  "actionsname": ["<name>",
  "version": "<version>",
 actionspec1 "label": "<label>",
    actionspec2"description": "<description>",
    ...
  ]
}"org": "<org>",
  "categories": [ <categories> ],
  "cdapVersion": "<compatible-versions>",
  "changelog": "<changes>",
  "actions": [
    actionspec1,
    actionspec2,
    ...
  ]
}

The actions in the spec will correspond to steps in the UI wizard for installing the package.

...

Each action will contain a type, a list of arguments, and dependencies. Each type of action will require different arguments. In the first version, the following types will be supported: create_artifact, create_app, create_stream, create_dataset, create_hydrator_draft.

Code Block
{
  "type": "create_artifact" | "create_app" | "create_stream" | "create_dataset" | "create_hydrator_draft",load_datapack" | "install_package"
  "arguments": [
    {
      "name": [argument name],
      "value": [argument value],
      "canModify": true | false (defaults to false)
    }
  ]
}

Some arguments can be modified by users in the resulting wizard. For example, the name of an application may be a field that the user should be able to edit.

...

archivelink
namedescriptionrequired?default
nameartifact nameyes 
jarname of jar file in package archive

yes

 

no (if using externalArchive)

 
externalJarlink to download 3rd party jarnonone
externalArchivelink to download 3rd party archivenonone
archivesigexternalArchiveSignaturelink to get 3rd party archive signaturenonone
externalArchiveJarpath of the jar file in the external archivenonone
scopeartifact scope (implies API to add system artifacts is added in 4.0)nouser
versionartifact version to pass as Artifact-Version headernonone
parentsconfigconfig file contains artifact parents to pass as Artifact-Extends headernononepluginsartifact plugins to pass as Artifact-Plugins header, plugins, and propertiesnonone

create_app

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/lifecycle.html#create-an-application

...

namedescriptionrequired?default
namestream nameyes 
descriptionstream description, results in call to set stream propertiesnoempty
formatstream format as json object, results in call to set stream properties

no

empty
schemastream schema, results in call to set stream propertiesnoempty
ttlstream ttl, results in call to set stream propertiesno

empty

notification.threshold.mbmb threshold for sending notifications, results in call to set stream propertiesno

empty

loadfiles

create_dataset

...

create_dataset

Results in a Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/dataset.html#creating-a-dataset

namedescriptionrequired?default
namedataset nameyes 
typedataset typeyes 
descriptiondataset descriptionnoempty
propertiesjson map of dataset properties

no

empty

...

load_

...

datapack

Loads a datapack into some dataset or stream.

scope, name, version of the artifact to create the app withpipeline config (file in the package archive)
namedescriptionrequired?default
namepipeline dataset/stream nameyes artifact
filesfiles to load into the dataset/streamyes config

install_package

Installs another package from the marketplace.

namedescriptionrequired?default
namepackage nameyes 

Dependencies

Packages will be able to specify dependencies on other packages.

Code Block
{
  ...
  "dependencies": [
    { 
      "name": "spark-plugins",
      "version": "1.5.0"
    }
  ]
}

If other packages are listed as a dependency, the actions for the dependent package must be executed before the actions of the current package

Note

Package dependencies introduces non-trivial logic in the UI and allow users to create some complex dependency chains. To simplify things in the first version, it may be a good idea to enforce that dependencies are only one level deep. That is, a package cannot depend on a package that has dependencies.

Failures

Since a package spec can contain multiple actions, what happens if some actions succeed and then one action fails? Since the CDAP APIs backing these actions are idempotent, we can ask the user if they want to retry.

Example Use Cases

versionpackage versionyes 

Failures

Since a package spec can contain multiple actions, what happens if some actions succeed and then one action fails? We will not attempt rollback or anything like that. Instead, all the wizards that execute the actions must be idempotent. For example, if told to add an artifact and the artifact already exists, the step can simply be skipped.

Hosting a Custom Marketplace

To host a custom marketplace, users can run an apache httpd server on top of a local directory structure. To make this easier, we could create a github repository of all the public packages hosted by Cask. The repository will follow the directory structure documented here, and have a script at the top level that builds the zip, signs the zips and specs, and generates the packages.json file. 

 

Example Use Cases

Scenario 1: Add a draft of a SFDC Lead Dump Hydrator pipeline

When the user clicks on the '+' button, the UI makes a call to get all the packages it can install:

Code Block
GET /groupsv1/packages.json
[
  ...,
  {
    "name": "examplessfdc-lead-dump",
    "label": "ExamplesSFDC Lead Dump",
    "description": "ExampleReads applicationsSFDC todata getfrom started witha CDAP." Stream, filters },
  {invalid records, and dumps the data to a CDAP Table.",
    "nameauthor": "hydrator-pipelinesCask",
    "labelorg": "Hydrator PipelinesCask Data Inc.",
    "descriptionversion": "Templates of various Hydrator pipelines."1.0.0",
    "categories": [ "hydrator-pipelines" ]
  },
  ...
]

to display all the different types of things the user can add in the CDAP marketplace. Among that list is 'Hydrator Pipelines'Among that list is version 1.0.1 of the 'SFDC Lead Dump' package, which the user clicks on. The UI makes another a call to list the packages in the 'Hydrator Pipelines' groupget the license for that package:

Code Block
GET /groups/hydrator-plugins/packages
[
  ...,
  {
    "name": "v1/packages/sfdc-lead-dump/1.0.0/license.txt
[ apache2 license ]

 

The user accepts the conditions, and the UI makes a call to get the spec for that package:

Code Block
GET /v1/packages/sfdc-lead-dump/1.0.0/spec.json
{
  "name": "sfdc-lead-dump",
    "label": "SFDC Lead Dump",
    "description": "Reads SFDC data from a CDAP Stream, filters invalid records, and dumps the data to a CDAP Table.",
    "author": "Cask",
    "org": "Cask Data Inc.",
  }"version": "1.0.1",
  ...
]

Among that list is the 'SFDC Lead Dump' package, which the user clicks on. The UI makes a call to get all versions of that package:

Code Block
GET /groups/hydrator-plugins/packages/sfdc-lead-dump/versions
[
  { "created": 1234567899,
  "changelog": "",
  "actions": [
    {
      "nametype": "sfdc-lead-dumpcreate_artifact",
      "labelarguments": "SFDC[
  Lead Dump",     "description": "Reads SFDC data from a CDAP Stream, filters invalid records, and dumps the data to a CDAP Table.",{
          "name": "scope",
          "authorvalue": "Caskuser",
     "org": "Cask Data Inc."},
    "version": "1.0.1",    {
  "created": 1234567899,       "changelogname": ["name",
      "fixed a small parsing bug"value": "sfdc-plugins"
   ],     "dependencies": {},
       "cdap": {
          "minVersionname": "4.0.0version",
          "maxVersionvalue": "41.10.0"
        },
      }  {
           }"name": "config",
  ...
]

It defaults to the most recent version that is compatible with the version of CDAP that is running. The user decides to install the package, so the UI makes a call to get the package spec:

Code Block
GET /groups/hydrator-pipelines/packages/sfdc-lead-dump/versions/1.0.1/spec
{        "namevalue": "sfdc-lead-dump",
  "label": "SFDC Lead Dump",
  "description": "Reads SFDC data from a CDAP Stream, filters invalid records, and dumps the data to a CDAP Table.",
  "author": "Cask",
  "org": "Cask Data Inc.",
  "version": "1.0.1",
  "created": 1234567899,
  "changelog": [
    "fixed a small parsing bug"
  ],
  "actions": [plugins.json" // file in the archive
        },
        {
          "name": "jar",
          "value": "sfdc-plugins.jar" // file in the archive
        }
      ]
    },
    {
      "type": "create_artifactapp",
      "arguments": [
        {
          "name": "scopeartifact",
          "value": "user",{
            "canModifyscope": false
        },
  "system",
     {           "name": "namecdap-data-pipeline",
            "valueversion": "sfdc-plugins",4.0.0"
          "canModify":}
false         },
        {
          "name": "versionname",
          "value": "1.0.0SFDC Lead Dump",
          "canModify": falsetrue
        },
        {
          "name": "parentsconfig",
          "value": "system:cdap-data-pipeline[4.0.0,4.1.0)",
    sfdc.json" // file in the archive
     "canModify": false
        },
      ]
 {   }
       "name": "jar",
          "value": "sfdc-plugins.jar", // file in the archive
          "canModify": false
        }
      ]
    },
    {
      "type": "create_hydrator_draft",
      "arguments": [
        {
          "name": "artifact",
          "value": {
            "scope": "system",
            "name": "cdap-data-pipeline",
            "version": "4.0.0"
          },
          "canModify": false
        },
        {
          "name": "name",
          "value": "SFDC Lead Dump",
          "canModify": true
        },
        {
          "name": "config",
          "value": "sfdc.json", // file in the archive
          "canModify": false
        }
      ]
    }
  ]
}

The UI also fetches the spec signature and uses the public key to validate the spec:

Code Block
GET /groups/hydrator-pipelines/packages/sfdc-lead-dump/versions/1.0.1/spec.asc

The UI also fetch the package archive and signature. It validates the package, and writes the archive to a local temporary directory so that it can use its resources to create the plugins artifact and create the hydrator draft

Code Block
GET /groups/hydrator-pipelines/packages/sfdc-lead-dump/versions/1.0.1/archive.tgz
GET /groups/hydrator-pipelines/packages/sfdc-lead-dump/versions/1.0.1/archive.tgz.asc

Based on the package spec, the UI can setup the relevant wizards and make the relevant CDAP calls to first create the plugin artifact, and next create the Hydrator draft.

 

Scenario 7: Add MySQL jdbc driver as a Hydrator plugin.

When the user clicks on the '+' button, the UI makes a call:

Code Block
GET /groups
[
  {
    "name": "examples",
    "label": "Examples",
    "description": "Example applications to get started with CDAP."
  },
  {
    "name": "hydrator-plugins",
    "label": "Hydrator Plugins",
    "description": "Plugins for Hydrator Pipelines."
  },
  ...
]

to display all the different types of things the user can add in the CDAP marketplace. Among that list is 'Hydrator Plugins', which the user clicks on. The UI makes another call to list the packages in the 'Hydrator Plugins' group:

Code Block
GET /groups/hydrator-plugins/packages
[
  ...,
  {
    "name": "mysql-jdbc-driver",
    "label": "MySQL JDBC Driver",
    "description": "JDBC Driver for MySQL databases.",
    "author": "MySQL",
    "org": "Oracle"
  },
  ...
]

Among the list is the MySQL JDBC Driver, which the user clicks on. The UI makes a call to get all versions of that package:

Code Block
GET /groups/hydrator-plugins/packages/mysql-jdbc-driver/versions
[
  {
    "name": "mysql-jdbc-driver",
    "label": "MySQL JDBC Driver",
    "description": "JDBC Driver for MySQL databases.",
    "author": "MySQL",
    "org": "Oracle",
    "version": "5.1.39",
    "created": 1234567899,
    "changelog": [ ],
    "dependencies": { }
  },
  ...
]

The user decides to install the 5.1.38 version of the driver. The UI makes a call to get the spec, and to get the spec signature to make sure it is valid:

Code Block
GET /groups/hydrator-plugins/packages/mysql-jdbc-driver/versions/5.1.39/spec.asc
GET /groups/hydrator-plugins/packages/mysql-jdbc-driver/versions/5.1.39/spec
{    
  "name": "mysql-jdbc-driver",
  "label": "MySQL JDBC Driver",
  "description": "JDBC Driver for MySQL databases.",
  "author": "MySQL",
  "org": "Oracle",
  "version": "5.1.39",
  "created": 1234567899,
  "actions": [
    {
      "type": "create_artifact",
      "arguments": [
        {
          "name": "scope",
          "value": "user",
          "canModify": false
        },
        {
          "name": "name",
          "value": "mysql-connector-java",
          "canModify": false
        },
        {
          "name": "version",
          "value": "5.1.39",
          "canModify": false
        },
        {
          "name": "parents",
          "value": "system:cdap-data-pipeline[3.0.0,10.0.0]/system:cdap-data-streams[3.0.0,10.0.0]",
          "canModify": false
        },
        {
          "name": "jar",]
}

The UI also gets the spec signature to validate the spec:

Code Block
GET /v1/packages/sfdc-lead-dump/1.0.1/spec.json.asc

The UI also fetches the package archive and signature. It validates the package, and unzips the archive to a local temporary directory so that it can use its resources to create the plugins artifact and create the hydrator draft

Code Block
GET /v1/packages/sfdc-lead-dump/1.0.1/archive.zip
GET /v1/packages/sfdc-lead-dump/1.0.1/archive.zip.asc

Based on the package spec, the UI can setup the relevant wizards and make the relevant CDAP calls to first create the plugin artifact, and next create the Hydrator pipeline.

 

Scenario 7: Add MySQL jdbc driver as a Hydrator plugin.

When the user clicks on the '+' button, the UI makes a call to list all packages that can be added to CDAP:

Code Block
GET /v1/packages.json
[
  ...,
  {
    "name": "mysql-jdbc-driver",
    "label": "MySQL JDBC Driver",
    "description": "JDBC Driver for MySQL databases.",
    "author": "MySQL",
    "org": "Oracle",
    "version": "5.1.39",
    "categories": [ "hydrator-plugins" ]
  },
  ...
]

Among the list is the MySQL JDBC Driver, which the user clicks on. The UI makes a call to get the license for that package:

Code Block
GET /v1/packages/mysql-jdbc-driver/5.1.39/license.txt
[ gpl license ]

The user accepts the conditions, and the UI makes a call to get the spec for that package:

Code Block
GET /v1/packages/mysql-jdbc-driver/5.1.39/spec.json
{    
  "name": "mysql-jdbc-driver",
  "label": "MySQL JDBC Driver",
  "description": "JDBC Driver for MySQL databases.",
  "author": "MySQL",
  "org": "Oracle",
  "version": "5.1.39",
  "categories": [ "hydrator-plugins" ]
  "created": 1234567899,
  "actions": [
    {
      "type": "create_artifact",
      "arguments": [
        {
           "valuename": "mysql-connector-java-5.1.39-bin.jarscope",
// file in the archive
          "canModifyvalue": false"user"
        },
        {
          "name": "archivelinkname",
          "value": "https://dev.mysql.com/downloads/file/?id=462849",
          "canModify": falsemysql-connector-java"
        },
        {
          "name": "archivesigversion",
          "value": "https://dev.mysql.com/downloads/gpg/?file=mysql-connector-java-5.1.39.tar.gz"
        },
        {
          "name": "pluginsexternalArchive",
          "value": {
            "parents": [ "cdap-data-pipeline[3.0.0,10.0.0]" ],"https://dev.mysql.com/downloads/file/?id=462849"
        },
    "plugins": [   {
           {"name": "externalArchiveSignature",
                "name" "value": "https://dev.mysql",.com/downloads/gpg/?file=mysql-connector-java-5.1.39.zip.gz"
                "type" : "jdbc",},
        {
          "classNamename" : "com.mysql.jdbc.DriverexternalArchiveJar",
    
           "descriptionvalue" : "Plugin for MySQL JDBC drivermysql-connector-java-5.1.39-bin.jar"
        },
     }   {
         ] "name": "config",
        },  "value": "mysql-connector-java-5.1.39.json" // file in the archive containing parents "canModify": falseand plugins
        }
      ]
    }
  ]
}

The UI also makes a call to get the spec signature to make sure it is valid:

Code Block
GET /v1/packages/mysql-jdbc-driver/versions/5.1.39/spec.asc

The UI then makes calls to get the archive and its signature to validate the archive, and unpack unzip it in a local directory. It uses the jar and json config file contained in the archive to make a request to add the artifact to cdap.

Code Block
GET /groupsv1/hydrator-plugins/packages/mysql-jdbc-driver/versions/5.1.3839/archive.tgzzip.asc
GET /groupsv1/hydrator-plugins/packages/mysql-jdbc-driver/versions/5.1.3839/archive.tgzzip