Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
GET /groups/<group>/packages/<package>/versions
ex: GET /groups/examples/packages/PurchaseExample/versions
[
  {    
    "versionname": "4.0.1PurchaseExample",
    "createdlabel": 1234567899"Purchase History",
    "changelogdescription": [
      "fixed a small parsing bug"
"Example Application demonstrating usage of flows, workflows, mapreduce, and services.",
    "author": "Cask",
   ] "org": "Cask },
  {Data Inc."
    "version": "4.0.01",
    "created": 12345678901234567899,
    "changelog": [
      "updatedfixed APIsa tosmall work with CDAP 4.0.0parsing bug"
    ]
  },
  ...
]

Get Package Archive

Code Block
GET /groups/<group>/packages/<package>/versions/<version>/archive.tgz
ex: GET /groups/examples/packages/PurchaseExample/versions/4.0.1/archive.tgz
[ binary archive contents] 

Get Package Archive Signature

Code Block
GET /groups/<group>/packages/<package>/versions/<version>/archive.tgz.asc
ex: GET /groups/examples/packages/PurchaseExample/versions/4.0.1/archive.tgz.asc
[ archive signature ] 

Get Package Spec

Code Block
GET /groups/<group>/packages/<package>/versions/<version>/spec
ex: GET /groups/examples/packages/PurchaseExample/versions/4.0.1/spec
{
  "metadata": {
    "spec-version": "1.0",
  },
  "actions": [
    {
      "type": "create_artifact",{    
    "name": "PurchaseExample",
    "label": "Purchase History",
    "description": "Example Application demonstrating usage of flows, workflows, mapreduce, and services.",
    "author": "Cask",
    "org": "Cask Data Inc."
    "version": "4.0.0",
    "created": 1234567890,
    "changelog": [
      "arguments":updated [APIs to work with CDAP 4.0.0"
   { ]
  },
  ...
]

Get Package Archive

Code Block
GET   "widget-type": "constant",
          "name": "name",
          "value": "PurchaseHistoryExample"
        },
        {
          "widget-type": "constant",
       /groups/<group>/packages/<package>/versions/<version>/archive.tgz
ex: GET /groups/examples/packages/PurchaseExample/versions/4.0.1/archive.tgz
[ binary archive contents] 

Get Package Archive Signature

Code Block
GET /groups/<group>/packages/<package>/versions/<version>/archive.tgz.asc
ex: GET /groups/examples/packages/PurchaseExample/versions/4.0.1/archive.tgz.asc
[ archive signature ] 

Get Package Spec

Code Block
GET /groups/<group>/packages/<package>/versions/<version>/spec
ex: GET /groups/examples/packages/PurchaseExample/versions/4.0.1/spec
{
  "metadata": {
    "spec-version": "1.0",
  },
  "name": "versionPurchaseExample",
  "label": "Purchase History",
     "valuedescription": "4.0.1"
        },
        {
          "widget-typeExample Application demonstrating usage of flows, workflows, mapreduce, and services.",
  "author": "constantCask",
  "org": "Cask Data Inc.",
    "nameversion": "scope4.0.1",
  "created": 1234567899,
      "valuechangelog": "user"[
    "fixed a small  },
   parsing bug"
  ],
  "actions": [
    {
      "type": "create_artifact",
      "arguments": [
        {
          "widget-type": "constant",
          "name": "jarname",
          "value": "PurchaseHistoryExample-4.0.1.jar"
        },
      ]  {
  },     {   "widget-type": "constant",
   "type       "name": "create_appversion",
          "argumentsvalue": [ "4.0.1"
        },
        {
          "widget-type": "textboxconstant",
          "name": "namescope",
          "defaultvalue": "PurchaseHistoryuser"
        },
      ]  {
      }   ]
}

Get Package Spec Signature

Code Block
GET /groups/<group>/packages/<package>/versions/<version>/spec.asc
ex: GET /groups/examples/packages/PurchaseExample/versions/4.0.1/spec.asc
[ spec signature ]

Security

Since people will be able to download code from the marketplace, it is especially important that there is protection against malicious code. We can make use of PGP in order to sign both the package archive and the package spec that are downloadable from the marketplace. The Market UI will have to be configured to use a GPG key (for the public CDAP marketplace, we could re-use the GPG key used for CDAP rpms and debians or create another one). It can then use that public key along with the signature APIs to verify that the spec and archive were signed by the owner of the package.

Package Spec

The package spec contains some metadata about the spec itself, and a list of actions to perform on the CDAP instance. It is a JSON file of the following structure:

Code Block
{
  "metadata": "widget-type": "constant",
          "name": "jar",
          "value": "PurchaseHistoryExample-4.0.1.jar"
        }
      ]
    },
    {
      "spec-versiontype": "1.0"create_app",
  },    "actionsarguments": [
    actionspec1,     {
          "widget-type": "textbox",
     actionspec2,     "name": "name",
     ...   ]
}

The actions in the spec will correspond to steps in the UI wizard for installing the package.

Action Spec

Each action will contain a type, a list of arguments, and dependencies. Each type of action will require different arguments. In the first version, the following types will be supported: create_artifact, create_app, create_stream, create_dataset, create_hydrator_draft.

Code Block
{
  "type": "create_artifact" | "create_app" | "create_stream" | "create_dataset" | "create_hydrator_draft"  "default": "PurchaseHistory"
        }
      ]
    }
  ],
  "argumentsdependencies": [
    {
 
    "namecdap": [argument name],
{
      "valueminVersion": [argument value]"4.0.0",
      "canModifymaxVersion": true | false"4.1.0"
    }
  ],
  "dependencies": { ... }
}

Some arguments can be modified by users in the resulting wizard. For example, the name of an application may be a field that the user should be able to edit.

create_artifact

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/artifact.html#add-an-artifact

namedescriptionrequired?default
nameartifact nameyes 
jarname of jar file in package archive

yes

 
scopeartifact scope (implies API to add system artifacts is added in 4.0)nouser
versionartifact version to pass as Artifact-Version headernonone
parentsartifact parents to pass as Artifact-Extends headernonone
pluginsartifact plugins to pass as Artifact-Plugins headernonone

create_app

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/lifecycle.html#create-an-application

namedescriptionrequired?default
nameapp nameyes 
artifactscope, name, version of the artifact to create the app with

yes

 
configapp confignoempty

create_stream

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/stream.html#creating-a-stream

Depending on the arguments, subsequent calls to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/stream.html#getting-and-setting-stream-properties (to set format, schema, ttl)

and http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/stream.html#sending-events-to-a-stream-in-batch (load data into a stream) may be made.

namedescriptionrequired?default
namestream nameyes 
descriptionstream description, results in call to set stream propertiesnoempty
formatstream format as json object, results in call to set stream properties

no

empty
schemastream schema, results in call to set stream propertiesnoempty
ttlstream ttl, results in call to set stream propertiesno

empty

notification.threshold.mbmb threshold for sending notifications, results in call to set stream propertiesno

empty

loadfilesfiles in the package archive to write to the stream. results in a call to write to the stream in batchnoempty

create_dataset

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/dataset.html#creating-a-dataset

namedescriptionrequired?default
namedataset nameyes 
typedataset typeyes 
descriptiondataset descriptionnoempty
propertiesjson map of dataset properties

no

empty

create_hydrator_draft

Results in whatever the UI does to create a draft

...

Get Package Spec Signature

Code Block
GET /groups/<group>/packages/<package>/versions/<version>/spec.asc
ex: GET /groups/examples/packages/PurchaseExample/versions/4.0.1/spec.asc
[ spec signature ]

Security

Since people will be able to download code from the marketplace, it is especially important that there is protection against malicious code. We can make use of PGP in order to sign both the package archive and the package spec that are downloadable from the marketplace. The Market UI will have to be configured to use a GPG key (for the public CDAP marketplace, we could re-use the GPG key used for CDAP rpms and debians or create another one). It can then use that public key along with the signature APIs to verify that the spec and archive were signed by the owner of the package.

Package Spec

The package spec contains some metadata about the spec itself, and a list of actions to perform on the CDAP instance. It is a JSON file of the following structure:

Code Block
{
  "metadata": {
    "spec-version": "1.0"
  },
  "actions": [
    actionspec1,
    actionspec2,
    ...
  ]
}

The actions in the spec will correspond to steps in the UI wizard for installing the package.

Action Spec

Each action will contain a type, a list of arguments, and dependencies. Each type of action will require different arguments. In the first version, the following types will be supported: create_artifact, create_app, create_stream, create_dataset, create_hydrator_draft.

Code Block
{
  "type": "create_artifact" | "create_app" | "create_stream" | "create_dataset" | "create_hydrator_draft",
  "arguments": [
    {
      "name": [argument name],
      "value": [argument value],
      "canModify": true | false
    }
  ]
}

Some arguments can be modified by users in the resulting wizard. For example, the name of an application may be a field that the user should be able to edit.

create_artifact

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/artifact.html#add-an-artifact

namedescriptionrequired?default
nameartifact nameyes 
jarname of jar file in package archive

yes

 
scopeartifact scope (implies API to add system artifacts is added in 4.0)nouser
versionartifact version to pass as Artifact-Version headernonone
parentsartifact parents to pass as Artifact-Extends headernonone
pluginsartifact plugins to pass as Artifact-Plugins headernonone

create_app

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/lifecycle.html#create-an-application

namedescriptionrequired?default
nameapp nameyes 
artifactscope, name, version of the artifact to create the app with

yes

 
configpipeline app configyesno empty

Dependencies

Packages will only be able to specify dependencies on the CDAP version, as well as dependencies on the existence of specific CDAP entities. For example, the core-plugins-1.5.0 package requires that there exist system artifacts cdap-data-pipeline-4.0.0 and cdap-data-streams-4.0.0 in the CDAP instance.

Code Block
{
  ...
  "dependencies": {
    "cdap": {
      "minVersion": "4.0.0",
      "maxVersion": "4.1.0"
    },
    "artifacts": [
      { 
        "scope": "system",
        "name": "spark-plugins",
        "minVersion": "1.5.0",
        "maxVersion": "1.6.0"
      },
      ...
    ],
    "streams": [
      { "name": "smsTexts" }
    ],
    "datasets": [
      { "name": "spamTexts" }
    ]
  }
}

 

Failures

Since a package spec can contain multiple actions, what happens if some actions succeed and then one action fails? Since the CDAP APIs backing these actions are idempotent, we can ask the user if they want to retry.

Architecture

There will be a set of marketplace APIs that the UI will use to get groups, packages, package versions, icons, and package tarballs. There will be a market server that powers these APIs.  The server will use a set of internal storage interfaces that define how to read the information required by the APIs. We can start with a storage implementation that simply reads from local files, and perhaps another storage implementation that reads from cloud storage like S3.

Image Removed

The market server will be stateless, so a load balancer can be placed in front of it to ensure that it is highly available and to ensure that it can handle a high volume of requests

Image Removed

File Store

The first implementation of the storage layer can simply be a store that looks at a filesystem for files containing the relevant information. The File Store will expect a specific directory structure:

Code Block
<base dir>/<group>/icon.jpg
<base dir>/<group>/meta.json
<base dir>/<group>/<package>/<version>/spec.json
<base dir>/<group>/<package>/<version>/icon.jpg
<base dir>/<group>/<package>/<version>/archive.tgz
 
ex:
/opt/cdap/marketplace/examples/icon.jpg
/opt/cdap/marketplace/examples/meta.json
/opt/cdap/marketplace/examples/PurchaseExample/4.0.1/archive.tgz
/opt/cdap/marketplace/examples/PurchaseExample/4.0.1/spec.json
/opt/cdap/marketplace/examples/PurchaseExample/4.0.1/icon.jpg
/opt/cdap/marketplace/examples/PurchaseExample/4.0.0/archive.tgz
/opt/cdap/marketplace/examples/PurchaseExample/4.0.0/spec.json
/opt/cdap/marketplace/examples/PurchaseExample/4.0.0/icon.jpg

On start up, the server will scan the base directory, load relevant information into memory, and simply serve data based on the contents of the files. This would also let ops teams manage the marketplace through use of 'group' packages and 'cask package' packages. 

 

Note: with such little logic in the server, why bother having a server at all? Why not just stick an Apache server in front of files or serve directly from S3? The assumption is that we will need to add more complicated functionality in the future, such as APIs to add groups and packages, ability to search for packages by various fields, etc.

Example Use Cases

Scenario 1: Add a draft of a SFDC Lead Dump Hydrator pipeline

The marketplace has a group called 'Hydrator Pipelines'

When the user clicks on the '+' button, the UI makes a call:

Code Block
GET /groups
[
  {
  }
]

...

create_stream

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/stream.html#creating-a-stream

Depending on the arguments, subsequent calls to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/stream.html#getting-and-setting-stream-properties (to set format, schema, ttl)

and http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/stream.html#sending-events-to-a-stream-in-batch (load data into a stream) may be made.

namedescriptionrequired?default
namestream nameyes 
descriptionstream description, results in call to set stream propertiesnoempty
formatstream format as json object, results in call to set stream properties

no

empty
schemastream schema, results in call to set stream propertiesnoempty
ttlstream ttl, results in call to set stream propertiesno

empty

notification.threshold.mbmb threshold for sending notifications, results in call to set stream propertiesno

empty

loadfilesfiles in the package archive to write to the stream. results in a call to write to the stream in batchnoempty

create_dataset

Results in a call to http://docs.cdap.io/cdap/current/en/reference-manual/http-restful-api/dataset.html#creating-a-dataset

namedescriptionrequired?default
namedataset nameyes 
typedataset typeyes 
descriptiondataset descriptionnoempty
propertiesjson map of dataset properties

no

empty

create_hydrator_draft

Results in whatever the UI does to create a draft

namedescriptionrequired?default
namepipeline nameyes 
artifactscope, name, version of the artifact to create the app with

yes

 
configpipeline configyes 

Dependencies

Packages will only be able to specify dependencies on the CDAP version, as well as dependencies on the existence of specific CDAP entities. For example, the core-plugins-1.5.0 package requires that there exist system artifacts cdap-data-pipeline-4.0.0 and cdap-data-streams-4.0.0 in the CDAP instance.

Code Block
{
  ...
  "dependencies": {
    "cdap": {
      "minVersion": "4.0.0",
      "maxVersion": "4.1.0"
    },
    "artifacts": [
      { 
        "scope": "system",
        "name": "spark-plugins",
        "minVersion": "1.5.0",
        "maxVersion": "1.6.0"
      },
      ...
    ],
    "streams": [
      { "name": "smsTexts" }
    ],
    "datasets": [
      { "name": "spamTexts" }
    ]
  }
}

 

Failures

Since a package spec can contain multiple actions, what happens if some actions succeed and then one action fails? Since the CDAP APIs backing these actions are idempotent, we can ask the user if they want to retry.

Architecture

There will be a set of marketplace APIs that the UI will use to get groups, packages, package versions, icons, and package tarballs. There will be a market server that powers these APIs.  The server will use a set of internal storage interfaces that define how to read the information required by the APIs. We can start with a storage implementation that simply reads from local files, and perhaps another storage implementation that reads from cloud storage like S3.

Image Added

The market server will be stateless, so a load balancer can be placed in front of it to ensure that it is highly available and to ensure that it can handle a high volume of requests

Image Added

File Store

The first implementation of the storage layer can simply be a store that looks at a filesystem for files containing the relevant information. The File Store will expect a specific directory structure:

Code Block
<base dir>/<group>/icon.jpg
<base dir>/<group>/meta.json
<base dir>/<group>/<package>/<version>/spec.json
<base dir>/<group>/<package>/<version>/icon.jpg
<base dir>/<group>/<package>/<version>/archive.tgz
 
ex:
/opt/cdap/marketplace/examples/icon.jpg
/opt/cdap/marketplace/examples/meta.json
/opt/cdap/marketplace/examples/PurchaseExample/4.0.1/archive.tgz
/opt/cdap/marketplace/examples/PurchaseExample/4.0.1/spec.json
/opt/cdap/marketplace/examples/PurchaseExample/4.0.1/icon.jpg
/opt/cdap/marketplace/examples/PurchaseExample/4.0.0/archive.tgz
/opt/cdap/marketplace/examples/PurchaseExample/4.0.0/spec.json
/opt/cdap/marketplace/examples/PurchaseExample/4.0.0/icon.jpg

On start up, the server will scan the base directory, load relevant information into memory, and simply serve data based on the contents of the files. This would also let ops teams manage the marketplace through use of 'group' packages and 'cask package' packages. 

 

Note: with such little logic in the server, why bother having a server at all? Why not just stick an Apache server in front of files or serve directly from S3? The assumption is that we will need to add more complicated functionality in the future, such as APIs to add groups and packages, ability to search for packages by various fields, etc.

Example Use Cases

Scenario 1: Add a draft of a SFDC Lead Dump Hydrator pipeline

The marketplace has a group called 'Hydrator Pipelines'

When the user clicks on the '+' button, the UI makes a call:

Code Block
GET /groups
[
  {
    "name": "examples",
    "label": "Examples",
    "description": "Example applications to get started with CDAP."
  },
  {
    "name": "hydrator-pipelines",
    "label": "Hydrator Pipelines",
    "description": "Templates of various Hydrator pipelines."
  },
  ...
]

to display all the different types of things the user can add. Among that list is 'Hydrator Pipelines', which the user clicks on. The UI makes another call to list the packages in the 'Hydrator Pipelines' group:

Code Block
GET /groups/hydrator-plugins/packages
[
  ...,
  {
    "name": "sfdc-lead-dump",
    "label": "SFDC Lead Dump",
    "description": "Reads SFDC data from a CDAP Stream, filters invalid records, and dumps the data to a CDAP Table.",
    "author": "Cask",
    "org": "Cask Data Inc."
  },
  ...
]

Among that list is the 'SFDC Lead Dump' package, which the user clicks on. The UI makes a call to get all versions of that package:

Code Block
GET /groups/hydrator-plugins/packages/sfdc-lead-dump/versions
[
  {
    "name": "sfdc-lead-dump",
    "label": "SFDC Lead Dump",
    "description": "Reads SFDC data from a CDAP Stream, filters invalid records, and dumps the data to a CDAP Table.",
    "author": "Cask",
    "org": "Cask Data Inc.",
    "version": "1.0.1",
    "created": 1234567899,
    "changelog": [
      "fixed a small parsing bug"
    ],
    "dependencies": {
      "cdap": {
        "minVersion": "4.0.0",
        "maxVersion": "4.1.0"
      }
    }
  },
  ...
]

It defaults to the most recent version that is compatible with the version of CDAP that is running. The user decides to install the package, so the UI makes a call to get the package spec:

Code Block
GET /groups/hydrator-plugins/packages/sfdc-lead-dump/versions/1.0.1/spec
{
  "name": "sfdc-lead-dump",
  "label": "SFDC Lead Dump",
  "description": "Reads SFDC data from a CDAP Stream, filters invalid records, and dumps the data to a CDAP Table.",
  "author": "Cask",
  "org": "Cask Data Inc.",
  "version": "1.0.1",
  "created": 1234567899,
  "changelog": [
    "fixed a small parsing bug"
  ],
  "actions": [
    {
      "type": "create_artifact",
      "arguments": [
        {
          "name": "scope",
          "value": "user",
          "canModify": false
        },
        {
          "name": "name",
          "value": "sfdc-plugins",
          "canModify": false
        },
        {
          "name": "version",
          "value": "1.0.0",
          "canModify": false
        },
        {
          "name": "parents",
          "value": "system:cdap-data-pipeline[4.0.0,4.1.0)",
          "canModify": false
        }
      ]
    },
    {
      "type": "create_hydrator_draft",
      "arguments": [
        {
          "name": "artifact",
          "value": {
            "scope": "system",
            "name": "cdap-data-pipeline",
            "version": "4.0.0"
          },
          "canModify": false
        },
        {
          "name": "name",
          "value": "SFDC Lead Dump",
          "canModify": true
        },
        {
          "name": "config",
          "value": { [hydrator config here] },
          "canModify": false
        }
      ]
    }
  ]
}

 

 

Scenario 7: Add MySQL jdbc driver as a Hydrator plugin.

...