Deploying Artifacts to The Hub
This article will cover the steps needed to deploy an artifact (plugin, driver, pipeline etc) to The Hub.
Requirements
Access to The Hub GCS bucket (speak to derekwood@ or meseifan@ if you don't have access)
Access to The Hub Github Repo
gcloud
CLI (installation instructions)gsutil
tool (installation instructions)Artifact files (jar, json, icon etc)
Background info
The Hub contains many artifacts that users of Data Fusion can deploy at any time. The artifacts are stored in Google Cloud Storage, in this bucket. The packages.json
file in the bucket dictates which artifacts are visible and their properties. The artifacts are stored in the packages/
directory. Using the info from the packages.json
file, CDF is able to pull the artifacts from the GCS bucket directly and deploy them to the user’s instance.
The packages.json
file is a generated file, it should not be edited directly. The Packager is used to generate the file using the spec.json
files present in each directory. The directory structure of the Hub is as follows:
packages/<name>/<version>/spec.json
packages/<name>/<version>/icon.jpg
packages/<name>/<version>/<other files>
If there are multiple versions of the same artifact then multiple directories are created under the <name> directory. The <other files>
depends on the type of artifact, for example if the artifact is a plugin the structure would be:
packages/<name>/<version>/spec.json
packages/<name>/<version>/icon.jpg
packages/<name>/<version>/<plugin-name>-<plugin-version>.jar
packages/<name>/<version>/<plugin-name>-<plugin-version>.json
The exact naming of the plugin artifacts is not important as long as it matches the names in spec.json
but it is recommended that this naming convention is followed.
The spec.json
file needs to be configured correctly to allow the Packager to properly create the packages.json
file. More details on the format can be found here. This is an example spec.json for a plugin, it is fairly straightforward to edit this for other plugins.
{
"specVersion": "1.0",
"label": "Data Loss Prevention",
"description": "Data Loss Prevention plugins to filter, redact and decrypt sensitive data directly in a pipeline.",
"author": "Cask",
"org": "Cask Data, Inc.",
"created": 1589218398,
"categories": [
"hydrator-plugin"
],
"cdapVersion": "[6.1.1,7.0.0-SNAPSHOT)",
"paidLink":"https://cloud.google.com/dlp/pricing",
"beta":false,
"actions": [
{
"type": "one_step_deploy_plugin",
"label": "Deploy Data Loss Prevention Plugins",
"arguments": [
{
"name": "name",
"value": "dlp",
"canModify": false
},
{
"name": "version",
"value": "1.2.2",
"canModify": false
},
{
"name": "scope",
"value": "user",
"canModify": false
},
{
"name": "config",
"value": "dlp-1.2.2.json",
"canModify": false
},
{
"name": "jar",
"value": "dlp-1.2.2.jar",
"canModify": false
}
]
}
]
}
Steps
Clone The Hub Github Repo locally.
Create a new branch for your changes.
If you are deploying a new version of an existing artifact proceed to step 3. If you are deploying a new artifact that does not currently exist in the Hub then create a directory under
packages/
with the name of the artifact. The naming convention is “<artifact-type>
-<artifact-name>
“.Create a new directory under the
packages/<artifact>/
directory, name this new directory with the version number. Ex. to deploy version 1.0.0 the path would bepackages/<artifact>/1.0.0/
If you are updating an existing artifact, it is recommended that you delete the old version if they target the same version of CDAP. For example, if you’re adding version 1.1.0 and the existing version is 1.0.0 and they both target CDAP 6.1.1, then we recommend that you delete 1.0.0 since there is no reason anyone should deploy the older version.
Place the appropriate files in the directory you just created, following the info presented in the Background Info section.
Create a Pull Request with your changes and send the link to someone from the Cloud Data Fusion team for approval.
Once the PR is approved, merge it into the
master
branch.From the
hub/
directory run the following commands:Open the
packages.json
file and ensure your new artifact appears in the json with the correct version.When you are ready to deploy run the following commands:
Warning: This will push your changes to prod instantly, there is no staging environment or rollout. As soon as the upload is done the changes are live. Please double check that all the required artifact files are present and that packages.json
contains the correct versions before deploying.
12. The previous steps copied the artifacts to the central hub. There are also regional hubs are used by CDF 6.1.4 (and newer) instances. Run the following commands to sync all regional hubs to the central hub:
It is possible for the one of the gsutil commands to hang/freeze when processing this volume of data. The best solution is to kill the command (using Ctrl + C) and rerun it.
13. Wait a couple minutes for the changes to propagate and try to deploy your new artifact from The Hub in a CDF instance.