Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 30 Next »

Mocks

These mocks (not final) give an idea of the flow users will go through in order to install a package from the Cask Market.

The user clicks the '+' button, then selects a 'category' from the left sidebar to list 'packages' that can be installed.

When the user chooses a package to install, more information is displayed, including one or more steps involved in installing the package. Each step is a wizard for creating some CDAP entity.

Terminology

package - A collection of entities (artifacts, applications, datasets, streams) to add to CDAP. A package is identified by a name and version, and can be tagged with one or more categories. A package consists of an archive of resources (tarball) and a package spec.

package spec - A json file containing a list of actions to perform against CDAP. For example, a spec for the Purchase History example will include an action to add the Purchase History artifact, then an action to create an application from that artifact.

package archive - A zip containing any resources needed to perform the actions in the package spec. For example, if the spec contains an action to add an artifact, the archive must contain the jar file to add.

category - A package can be tagged with one or more categories. A category corresponds to one of the tabs on the left bar of the mocks.

Architecture

There will be a set of marketplace APIs that the UI will use to get categories and packages. In the first version of the market, the APIs will simply be static content served from S3. This essentially amounts to placing packages in a pre-determined directory structure, and generating index files that will be used to get metadata about the packages, such as the list of all packages, list of packages in a category, versions of a package, etc. 

 

There will be an internal process to push the entire market repository to S3. If a user wishes to host their own marketplace, they can do so using their own S3 instance or by hosting their own server.

APIs

The APIs are simply a contract about the directory structure of the marketplace. All APIs are relative to a base path. For example, cask.co/marketplace/v1. The structure is expected to be:

GET
<base>/categories.json
<base>/packages.json
<base>/packages-<category>.json
<base>/packages/<package-name>/versions.json
<base>/packages/<package-name>/<version>/icon.jpg
<base>/packages/<package-name>/<version>/spec.json
<base>/packages/<package-name>/<version>/spec.json.asc
<base>/packages/<package-name>/<version>/archive.zip
<base>/packages/<package-name>/<version>/archive.zip.asc

The packages.json, packages-<category>.json, versions.json, and signature files will be generated using a tool from the categories.json and all the package spec.json files.

In order to make serving easier, we are sacrificing flexibility and extensibility. For example, searching and filtering packages (beyond filtering by a single category) cannot be done in this way. One useful thing we may want to support very soon is filtering packages based on the CDAP version.

List Categories

GET /categories.json
[
  {
    "name": "examples",
    "label": "Examples",
    "description": "Example applications to get started with CDAP."
  },
  {
    "name": "use-cases",
    "label": "Use Cases",
    "description": "Common Use Cases."
  },
  ...
]

List Latest Version of all Packages

GET /packages.json
[
  {
    "name": "PurchaseExample",
    "label": "Purchase History",
    "description": "Example Application demonstrating usage of flows, workflows, mapreduce, and services.",
    "author": "Cask",
    "org": "Cask Data Inc.",
    "version": "4.0.1",
    "categories": [ "examples" ]
  },
  {
    "name": "HelloWorld",
    "label": "Hello World",
    "description": "Simple application demonstrating usage of flows and services.",
    "author": "Cask",
    "org": "Cask Data Inc.",
    "version": "4.0.0",
    "categories": [ "examples" ]
  },
  ...
]

List Latest Version of all Packages in a Category

GET /packages-<category>.json
ex: GET /packages-examples.json
[
  {
    "name": "PurchaseExample",
    "label": "Purchase History",
    "description": "Example Application demonstrating usage of flows, workflows, mapreduce, and services.",
    "author": "Cask",
    "org": "Cask Data Inc.",
    "version": "4.0.1",
    "categories": [ "examples" ]
  },
  {
    "name": "HelloWorld",
    "label": "Hello World",
    "description": "Simple application demonstrating usage of flows and services.",
    "author": "Cask",
    "org": "Cask Data Inc.",
    "version": "4.0.0",
    "categories": [ "examples" ]
  },
  ...
]

List Package Versions

GET /packages/<package-name>/versions.json
ex: GET /packages/PurchaseExample/versions.json
[
  {    
    "name": "PurchaseExample",
    "label": "Purchase History",
    "description": "Example Application demonstrating usage of flows, workflows, mapreduce, and services.",
    "author": "Cask",
    "org": "Cask Data Inc.",
    "version": "4.0.1",
    "categories": [ "examples" ],
    "created": 1234567899,
    "changelog": "fixed a small parsing bug",
    "dependencies": {
      "cdap": {
        "minVersion": "4.0.0",
        "maxVersion": "4.1.0"
      }
    }
  },
  {    
    "name": "PurchaseExample",
    "label": "Purchase History",
    "description": "Example Application demonstrating usage of flows, workflows, mapreduce, and services.",
    "author": "Cask",
    "org": "Cask Data Inc.",
    "categories": [ "examples" ],
    "version": "4.0.0",
    "created": 1234567890,
    "changelog": "updated APIs to work with CDAP 4.0.0",
    "dependencies": {
      "cdap": {
        "minVersion": "4.0.0",
        "maxVersion": "4.1.0"
      }
    }
  },
  ...
]

Get Package Archive

GET /packages/<package-name>/<version>/archive.zip
ex: GET /packages/PurchaseExample/4.0.1/archive.zip
[ binary archive contents] 

Get Package Archive Signature

GET /packages/<package-name>/<version>/archive.zip.asc
ex: GET /packages/PurchaseExample/4.0.1/archive.zip.asc
[ archive signature ] 

Get Package Spec

GET /packages/<package-name>/<version>/spec.json
ex: GET /packages/PurchaseExample/4.0.1/spec.json
{
  "spec-version": "1.0",
  "name": "PurchaseExample",
  "label": "Purchase History",
  "description": "Example Application demonstrating usage of flows, workflows, mapreduce, and services.",
  "author": "Cask",
  "org": "Cask Data Inc.",
  "version": "4.0.1",
  "created": 1234567899,
  "changelog": "fixed a small parsing bug",
  "categories": [ "examples" ],
  "dependencies": {
    "cdap": {
      "minVersion": "4.0.0",
      "maxVersion": "4.1.0"
    }
  },
  "actions": [
    {
      "type": "create_artifact",
      "arguments": [
        {
          "name": "name",
          "value": "PurchaseHistoryExample"
        },
        {
          "name": "version",
          "value": "4.0.1"
        },
        {
          "name": "scope",
          "value": "user"
        },
        {
          "name": "jar",
          "value": "PurchaseHistoryExample-4.0.1.jar"
        }
      ]
    },
    {
      "type": "create_app",
      "arguments": [
        {
          "name": "name",
          "default": "PurchaseHistory"
        }
      ]
    }
  ]
}

Get Package Spec Signature

GET /packages/<package-name>/<version>/spec.asc
ex: GET /packages/PurchaseExample/4.0.1/spec.asc
[ spec signature ]

Get Package Icon

GET /packages/<package-name>/<version>/icon.jpg

 

Security

Since people will be able to download code from the marketplace, it is especially important that there is protection against malicious code. We can make use of PGP in order to sign both the package archive and the package spec that are downloadable from the marketplace. The Market UI will have to be configured to use a GPG key (for the public CDAP marketplace, we could re-use the GPG key used for CDAP rpms and debians or create another one). It can then use that public key along with the signature APIs to verify that the spec and archive were signed by the owner of the package.

Package Spec

The package spec contains some metadata about the spec itself, and a list of steps to perform on the CDAP instance. It is a JSON file of the following structure:

{
  "spec-version": "1.0"
  "actions": [
    actionspec1,
    actionspec2,
    ...
  ]
}

The actions in the spec will correspond to steps in the UI wizard for installing the package.

Action Spec

Each action will contain a type, a list of arguments, and dependencies. Each type of action will require different arguments. In the first version, the following types will be supported: create_artifact, create_app, create_stream, create_dataset, create_hydrator_draft.

{
  "type": "create_artifact" | "create_app" | "create_stream" | "create_dataset" | "create_hydrator_draft",
  "arguments": [
    {
      "name": [argument name],
      "value": [argument value],
      "canModify": true | false
    }
  ]
}

Some arguments can be modified by users in the resulting wizard. For example, the name of an application may be a field that the user should be able to edit.

create_artifact

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/artifact.html#add-an-artifact

namedescriptionrequired?default
nameartifact nameyes 
jarname of jar file in package archive

yes

 
archivelinklink to download 3rd party archivenonone
archivesiglink to get 3rd party archive signaturenonone
scopeartifact scope (implies API to add system artifacts is added in 4.0)nouser
versionartifact version to pass as Artifact-Version headernonone
parentsartifact parents to pass as Artifact-Extends headernonone
pluginsartifact plugins to pass as Artifact-Plugins headernonone

create_app

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/lifecycle.html#create-an-application

namedescriptionrequired?default
nameapp nameyes 
artifactscope, name, version of the artifact to create the app with

yes

 
configapp config (file in the package archive)noempty

create_stream

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/stream.html#creating-a-stream

Depending on the arguments, subsequent calls to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/stream.html#getting-and-setting-stream-properties (to set format, schema, ttl)

and http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/stream.html#sending-events-to-a-stream-in-batch (load data into a stream) may be made.

namedescriptionrequired?default
namestream nameyes 
descriptionstream description, results in call to set stream propertiesnoempty
formatstream format as json object, results in call to set stream properties

no

empty
schemastream schema, results in call to set stream propertiesnoempty
ttlstream ttl, results in call to set stream propertiesno

empty

notification.threshold.mbmb threshold for sending notifications, results in call to set stream propertiesno

empty

loadfilesfiles in the package archive to write to the stream. results in a call to write to the stream in batchnoempty

create_dataset

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/dataset.html#creating-a-dataset

namedescriptionrequired?default
namedataset nameyes 
typedataset typeyes 
descriptiondataset descriptionnoempty
propertiesjson map of dataset properties

no

empty

create_hydrator_draft

Results in whatever the UI does to create a draft

namedescriptionrequired?default
namepipeline nameyes 
artifactscope, name, version of the artifact to create the app with

yes

 
configpipeline config (file in the package archive)yes 

Dependencies

Packages will be able to specify dependencies on the CDAP version, as well as dependencies on other packages.

{
  ...
  "dependencies": {
    "cdap": {
      "minVersion": "4.0.0",
      "maxVersion": "4.1.0"
    },
    "packages": [
      { 
        "name": "spark-plugins",
        "minVersion": "1.5.0",
        "maxVersion": "1.6.0"
      },
      ...
    ]
  }
}

Min versions are inclusive and max versions are exclusive.

 

If other packages are listed as a dependency, the actions for the dependent package must be executed before the actions of the current package.

Package dependencies introduces non-trivial logic in the UI and allow users to create some complex dependency chains. To simplify things in the first version, it may be a good idea to enforce that dependencies are only one level deep. That is, a package cannot depend on a package that has dependencies.

 

Failures

Since a package spec can contain multiple actions, what happens if some actions succeed and then one action fails? Since the CDAP APIs backing these actions are idempotent, we can ask the user if they want to retry.

Example Use Cases

Scenario 1: Add a draft of a SFDC Lead Dump Hydrator pipeline

When the user clicks on the '+' button, the UI makes a call:

GET /groups
[
  {
    "name": "examples",
    "label": "Examples",
    "description": "Example applications to get started with CDAP."
  },
  {
    "name": "hydrator-pipelines",
    "label": "Hydrator Pipelines",
    "description": "Templates of various Hydrator pipelines."
  },
  ...
]

to display all the different types of things the user can add in the CDAP marketplace. Among that list is 'Hydrator Pipelines', which the user clicks on. The UI makes another call to list the packages in the 'Hydrator Pipelines' group:

GET /groups/hydrator-plugins/packages
[
  ...,
  {
    "name": "sfdc-lead-dump",
    "label": "SFDC Lead Dump",
    "description": "Reads SFDC data from a CDAP Stream, filters invalid records, and dumps the data to a CDAP Table.",
    "author": "Cask",
    "org": "Cask Data Inc."
  },
  ...
]

Among that list is the 'SFDC Lead Dump' package, which the user clicks on. The UI makes a call to get all versions of that package:

GET /groups/hydrator-plugins/packages/sfdc-lead-dump/versions
[
  {
    "name": "sfdc-lead-dump",
    "label": "SFDC Lead Dump",
    "description": "Reads SFDC data from a CDAP Stream, filters invalid records, and dumps the data to a CDAP Table.",
    "author": "Cask",
    "org": "Cask Data Inc.",
    "version": "1.0.1",
    "created": 1234567899,
    "changelog": [
      "fixed a small parsing bug"
    ],
    "dependencies": {
      "cdap": {
        "minVersion": "4.0.0",
        "maxVersion": "4.1.0"
      }
    }
  },
  ...
]

It defaults to the most recent version that is compatible with the version of CDAP that is running. The user decides to install the package, so the UI makes a call to get the package spec:

GET /groups/hydrator-pipelines/packages/sfdc-lead-dump/versions/1.0.1/spec
{
  "name": "sfdc-lead-dump",
  "label": "SFDC Lead Dump",
  "description": "Reads SFDC data from a CDAP Stream, filters invalid records, and dumps the data to a CDAP Table.",
  "author": "Cask",
  "org": "Cask Data Inc.",
  "version": "1.0.1",
  "created": 1234567899,
  "changelog": [
    "fixed a small parsing bug"
  ],
  "actions": [
    {
      "type": "create_artifact",
      "arguments": [
        {
          "name": "scope",
          "value": "user",
          "canModify": false
        },
        {
          "name": "name",
          "value": "sfdc-plugins",
          "canModify": false
        },
        {
          "name": "version",
          "value": "1.0.0",
          "canModify": false
        },
        {
          "name": "parents",
          "value": "system:cdap-data-pipeline[4.0.0,4.1.0)",
          "canModify": false
        },
        {
          "name": "jar",
          "value": "sfdc-plugins.jar", // file in the archive
          "canModify": false
        }
      ]
    },
    {
      "type": "create_hydrator_draft",
      "arguments": [
        {
          "name": "artifact",
          "value": {
            "scope": "system",
            "name": "cdap-data-pipeline",
            "version": "4.0.0"
          },
          "canModify": false
        },
        {
          "name": "name",
          "value": "SFDC Lead Dump",
          "canModify": true
        },
        {
          "name": "config",
          "value": "sfdc.json", // file in the archive
          "canModify": false
        }
      ]
    }
  ]
}

The UI also fetches the spec signature and uses the public key to validate the spec:

GET /groups/hydrator-pipelines/packages/sfdc-lead-dump/versions/1.0.1/spec.asc

The UI also fetch the package archive and signature. It validates the package, and writes the archive to a local temporary directory so that it can use its resources to create the plugins artifact and create the hydrator draft

GET /groups/hydrator-pipelines/packages/sfdc-lead-dump/versions/1.0.1/archive.tgz
GET /groups/hydrator-pipelines/packages/sfdc-lead-dump/versions/1.0.1/archive.tgz.asc

Based on the package spec, the UI can setup the relevant wizards and make the relevant CDAP calls to first create the plugin artifact, and next create the Hydrator draft.

 

Scenario 7: Add MySQL jdbc driver as a Hydrator plugin.

When the user clicks on the '+' button, the UI makes a call:

GET /groups
[
  {
    "name": "examples",
    "label": "Examples",
    "description": "Example applications to get started with CDAP."
  },
  {
    "name": "hydrator-plugins",
    "label": "Hydrator Plugins",
    "description": "Plugins for Hydrator Pipelines."
  },
  ...
]

to display all the different types of things the user can add in the CDAP marketplace. Among that list is 'Hydrator Plugins', which the user clicks on. The UI makes another call to list the packages in the 'Hydrator Plugins' group:

GET /groups/hydrator-plugins/packages
[
  ...,
  {
    "name": "mysql-jdbc-driver",
    "label": "MySQL JDBC Driver",
    "description": "JDBC Driver for MySQL databases.",
    "author": "MySQL",
    "org": "Oracle"
  },
  ...
]

Among the list is the MySQL JDBC Driver, which the user clicks on. The UI makes a call to get all versions of that package:

GET /groups/hydrator-plugins/packages/mysql-jdbc-driver/versions
[
  {
    "name": "mysql-jdbc-driver",
    "label": "MySQL JDBC Driver",
    "description": "JDBC Driver for MySQL databases.",
    "author": "MySQL",
    "org": "Oracle",
    "version": "5.1.39",
    "created": 1234567899,
    "changelog": [ ],
    "dependencies": { }
  },
  ...
]

The user decides to install the 5.1.38 version of the driver. The UI makes a call to get the spec, and to get the spec signature to make sure it is valid:

GET /groups/hydrator-plugins/packages/mysql-jdbc-driver/versions/5.1.39/spec.asc
GET /groups/hydrator-plugins/packages/mysql-jdbc-driver/versions/5.1.39/spec
{    
  "name": "mysql-jdbc-driver",
  "label": "MySQL JDBC Driver",
  "description": "JDBC Driver for MySQL databases.",
  "author": "MySQL",
  "org": "Oracle",
  "version": "5.1.39",
  "created": 1234567899,
  "actions": [
    {
      "type": "create_artifact",
      "arguments": [
        {
          "name": "scope",
          "value": "user",
          "canModify": false
        },
        {
          "name": "name",
          "value": "mysql-connector-java",
          "canModify": false
        },
        {
          "name": "version",
          "value": "5.1.39",
          "canModify": false
        },
        {
          "name": "parents",
          "value": "system:cdap-data-pipeline[3.0.0,10.0.0]/system:cdap-data-streams[3.0.0,10.0.0]",
          "canModify": false
        },
        {
          "name": "jar",
          "value": "mysql-connector-java-5.1.39-bin.jar", // file in the archive
          "canModify": false
        },
        {
          "name": "archivelink"
          "value": "https://dev.mysql.com/downloads/file/?id=462849",
          "canModify": false
        },
        {
          "name": "archivesig",
          "value": "https://dev.mysql.com/downloads/gpg/?file=mysql-connector-java-5.1.39.tar.gz"
        }
        {
          "name": "plugins",
          "value": {
            "parents": [ "cdap-data-pipeline[3.0.0,10.0.0]" ],
            "plugins": [
              {
                "name" : "mysql",
                "type" : "jdbc",
                "className" : "com.mysql.jdbc.Driver",
                "description" : "Plugin for MySQL JDBC driver"
              }
            ]
          },
          "canModify": false
        }
      ]
    }
  ]
}

The UI then makes calls to get the archive and its signature to validate the archive, and unpack it in a local directory. It uses the jar and json config file contained in the archive to make a request to add the artifact to cdap.

GET /groups/hydrator-plugins/packages/mysql-jdbc-driver/versions/5.1.38/archive.tgz.asc
GET /groups/hydrator-plugins/packages/mysql-jdbc-driver/versions/5.1.38/archive.tgz
  • No labels