Uploading pipelines across Data Fusion versions

Data pipeline is represented as a graph of nodes and connections and can be exported in a JSON format. Each plugin and data pipeline application is defined by an artifact which has a name version and scope. Example:

 

"artifact": {

                        "name": "google-cloud",

                        "version": "0.16.0",

                        "scope": "SYSTEM"

                    }

 

Artifacts when exported from UI have the the exact major/minor/patch version set. When the pipelines are exported from UI from an older version and imported into a higher version from UI, the frontend checks if there are higher versions of artifacts and upgrades, this makes exporting and importing from older version to a new version of Data Fusion possible. However, when the pipeline is exported using UI and imported using REST API, the exact version pinning causes the deployment to fail since the higher version of Data Fusion will not have the older artifacts unless the instance was upgraded.

The specification for artifact allows for a version range to be specified, and the backend services will automatically use the highest available version during the deployment. As an example

 

"artifact": {

                        "name": "google-cloud",

                        "version": "[0.0.1,10.0.0)",

                        "scope": "SYSTEM"

                    }

 

For automation from older to newer instances, the following steps can be applied

Step 1: Export pipeline from UI and store in a file (ex: SQLToBQ-cdap-data-pipeline.json)

Step 2: Update version ranges for all artifacts in pipeline json

sed -i -e 's/\"version\": \"[0-9]*\.[0-9]*\.[0-9]*\"/\"version\":\"[0.0.1,10.0.0]\"/g' SQLToBQ-cdap-data-pipeline.json

Step 3: Deploy pipelines using PUT APIs to create

Version Range Syntax

The syntax for a version range uses standard interval notation; a square bracket includes the term, while a parenthesis does not. When two versions are compared, they are compared first by major, then by minor if major versions are equal, and lastly by patch, if the minor versions are equal. The version range [6.1.1,7.0.0) matches every version v such that 6.1.1 <= v < 7.0.0, so every minor or patch within CDAP major release 6 that is at least 6.1.1.

It is not required to provide all three components in a term. A user can also provide a range such as [0.0,100.0] to specify essentially every available version of an artifact. Allowing the same pipeline to rely on artifacts across major versions is not recommended, since major releases are by definition not backwards compatible.

Some artifact versions are still in the SNAPSHOT stage, meaning that they have not been finalized. SNAPSHOT is considered below non-SNAPSHOT, so [6.1.1,6.3.1) would match 6.3.1-SNAPSHOT but not 6.3.1.