Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Goals
Fundamental need that is driving this work is the need to have multiple versions of the same program (in this case the driving factor is Service) running so that the requests are still served while the earlier version is shutdown etc. This is required for zero downtime of the service and is required when apps are updated.
User Stories
Users is running v1.0.0 of analytics application
The application has a user service that receives events
User wants to upgrade to v2.0.0 of analytics application with minimal down-time
User also wants capabilities to send percentage of traffic of v2.0.0 version of the application before directing 100% traffic to it
- User wants to rollback to v1.0.0 with minimal downtime
Design
The above requirement suggests the need for application versioning. That is, the same application (identified by its name aka app name) can have multiple versions. Once we have that, if we have a service HTTP endpoint - /v3/namespaces/<ns-id>/apps/<app-id>/services/<service-id>/methods/<method-name>, this endpoint can be used continuously while underneath the user can now deploy an upgraded version of the app and the users of the endpoints are oblivious about that change and still served by one of the service versions. And the old service can then be stopped and this whole process doesn't affect the uptime of the endpoint.
We want to introduce application versioning without breaking backward compatibility. And current scope of the work involves only looking at handling Services. MR/Workflow/Spark already support concurrent runs and thus concurrent runs of these across multiple versions of the app shouldn't be an issue. However, what should happen when we have multiple versions of Flows run etc is not fully understood and thus we will not change the current design choice of not allowing concurrent runs of Flows (even across multiple versions of the App).
Approach
Application versions are chosen and set while creating the app by the user. 'Version' is represented as a string. If users don't provide a version, then the default version ("-SNAPSHOT") is used. If the app is created from an artifact, then the artifact version from the AppRequest is used as the version of the Application. Once created, an app version cannot be changed, unless the version ends with the string "-SNAPSHOT". So if a user is not using versioning at all, the current behavior of updating an app will work fine (since by default -SNAPSHOT is the version). For versioned endpoints (/apps/app-id/versions/version-id/services/MyService/start), the corresponding version of the app is used. If the user is using the non-versioned API (/apps/app-id/services/MyService/start), we will check if there are more than one version and if there is, then we will return an error code. This is so that backward compatibility (pre-app-version era and users who don't want to use app versions) is maintained. The only exception to this rule is the service method routing endpoints, which will use the configuration set to route the service requests appropriately by the CDAP Router (more details about that in the next section).
Reasoning:
In CDAP, relationship between application and artifact are not 1-to-1. Multiple applications can be created from the same artifact by providing different application configuration, hence using the artifact version as the application version doesn’t work very well. E.g. in Hydrator, multiple pipelines (pipeline == application) are created from the same artifact, the hydrator artifact. We are introducing new REST endpoint for deploying application so that version is provided explicitly. I think we can have that endpoint default to use the artifact version as the application if the app version is not provided explicitly and I believe that should fit your use case pretty well. About the default version “-SNAPSHOT”, it is mainly for backward compatibility purpose. That is, if one deploy an artifact+app using the existing endpoint, we use “-SNAPSHOT” as the version internally so that it can be overwritten on redeploy (non SNAPSHOT application versions are immutable).
Service Routing
Routing to services with multiple versions running concurrently will have an additional feature of controlling the distribution of the requests. Instead of being completely random, the user can choose to allow, say 80% of requests to be served by "2.0.0" version, 10% by "1.3.1" and 10% by "1.3.0". This will be made by possible by allowing the user to set a distribution strategy for a particular service (ns-name, app-name, service-name) and it will be used by CDAP Router when deciding the service instance to forward the request to. If the route configuration is missing, then the request will use the default behavior configured in CConf. It could be random, min, max (string comparison) or just fail to route.
Support for other Program Types (other than Services)
We support concurrent runs for Spark, MapReduce and Workflow and thus none of their functionality needs any changes/updates. For Flows and Workers, we don't support concurrent runs today and we will retain the same logic across all versions of the app. So say a worker named MyWorker of app ver1 is running and if the user tries to start the same worker of app ver 2, we will return back an error with CONFLICT.
API Changes
REST API changes
Path | Method | Description | Response Code | Response |
---|---|---|---|---|
/apps/app-id/versions/version-id/create { 'App Request' } | POST | Create or update an application from an artifact (this is the only app creation endpoint that will support versioning) Note: The call needs to be a POST since we won't allow updating of app versions which are not SNAPSHOT. As mentioned earlier, version-id is simply string that is valid CDAP ID. Note that the /apps POST
| 200 - On success 409 - Same application version already exists 500 - Any internal errors |
|
/apps/app-id/versions/version-id | DELETE | This endpoint will delete a particular version of the application. Same semantics as deleting
| 200 - On success 404 - When application is not available 409 - The application version still has program running 500 - Any internal errors | |
/apps/app-id | DELETE | For backward compatibility, this will delete the app with "-SNAPSHOT" string as version. | same as above | |
/apps/app-id/versions | GET | This endpoint will list all the versions different versions of the app that are present.
| 200 - On success 404 - When application is not available 500 - Any internal errors | List of versions in the format ["version1", "version2", ...] |
/apps/app-id/versions/version-id | GET | This endpoint will return the ApplicationDetail of an app version similar to the what is returned today for /apps/app-id. | 200 - On success 404 - When application is not available 500 - Any internal errors | |
/apps/app-id | GET | For backward compatibility, this will return ApplicationDetail of the app with "-SNAPSHOT" as version. | same as above | ApplicationDetail in JSON format |
/apps/app-id/versions/version-id/program-type/program-id/start (or stop) | POST | Start or stop a specific program in an app version | 200 - On success 404 - When application or program is not available 409 - The program to start (stop) is already running (stopped) 500 - Any internal errors | |
/apps/app-id/program-type/program-id/start (or stop) | POST | Start or stop a specific program in the app with "-SNAPSHOT" string as version. | same as above | |
/v3/namespaces/<namespace-id>/apps/<app-id>/services/<service-id>/routeconfig { 'Routing Config' } | PUT | Upload a load distribution configuration Routing Config, which is a JSON that whose structure looks as follows: { "version-id1":number1, "version-id2":number2, .... } For example, { "v1":10, "v2":90 } . This config says that version v1 should get 10% of the requests and
Property : 'cdap.service.http.routing.default' Values = { none, random, smallest, greatest } | 200 - On success 400 - When application, service or app version is not available. Or the sum of percentages in RouteConfig is not 100 500 - Any internal errors | |
/v3/namespaces/<namespace-id>/apps/<app-id>/services/<service-id>/routeconfig | DELETE | Delete the Routing Config of a given service of an app version | 200 - On success 404 - When application, service or app version is not available. 500 - Any internal errors | |
/v3/namespaces/<namespace-id>/apps/<app-id>/services/<service-id>/routeconfig | GET | Get the Routing Config of a given service of an app version | 200 - On success 500 - Any internal errors | Routing Config in JSON, or empty if the Application or service not available |
CLI Changes
Command | Description | Response |
---|---|---|
create app <app-id> [version <app-version>] <artifact-name> <artifact-version> <scope> [<app-config-file>] | Create or update an application from an artifact (this is the only app creation command that will support versioning) <app-version> is simply string that is valid CDAP ID. If <app-version> is not given, it will create or update the app with "-SNAPSHOT" string as version from existing artifacts. |
|
delete app <app-id> [version <app-version>] | This endpoint will delete a particular version of the application if the <app-version> is given. Same semantics as deleting the application today - i.e., no programs of that particular app-version can be running. If <app-version> is not given, it will delete the app with "-SNAPSHOT" string as version. | |
list app versions <app-id> | This command will list all the versions different versions of the app that are present. | A table of versions with one version in a row |
describe app <app-id> [version <app-version>] | This command will return the programs of an app version if <app-version> is given. If <app-version> is not given, return programs of the app with "-SNAPSHOT" string as version. | A table of programs with type, id, and description in every row |
start <program-type> <app-id.[app-version.]program-id> [<runtime-args>]
ALTERNATIVE: start <program-type> <app-id.program-id> [version <app-version>] [<runtime-args>] | This command will start the program of an app version if <app-version> is given. If <app-version> is not given, start the program of the app with "-SNAPSHOT" string as version. | |
stop <program-type> <app-id.[app-version.]program-id>
ALTERNATIVE:
stop <program-type> <app-id.program-id> [version <app-version>] | This command will stop the program of an app version if <app-version> is given. If <app-version> is not given, stop the program of the app with "-SNAPSHOT" string as version. | |
set routeconfig <app-id.service-id> <route-config> ALTERNATIVE: set routeconfig <route-config> for service <app-id.service-id> | This command will configure service routing configuration. The <route-config> follows the format:
| |
get routeconfig <app-id.service-id> <route-config> | Command to get service routing configuration | Routing configuration in JSON format |
delete routeconfig <app-id.service-id> | Command to delete service routing configuration | |
call service <app-id.[app-version.]service-id> <http-method> <endpoint> [headers <headers>] [body <body>] [body:file <local-file-path>] call service <app-id.service-id> [version <app-version>] <http-method> <endpoint> [headers <headers>] [body <body>] [body:file <local-file-path>] | Call service of a specific version |
Authorization
We convert the CDAP's entity id to the ID used by Sentry. For simplicity, I propose that we do authorization simply at the application name level and not any more fine grained. So a user who has access X to that app, will get access X to all the versions of that app. This conversion can be done in the cdap-security-extn where we can check the permissions at that level?
Logs
We will have an additional logging systemTag, for application version id. With the new LogViewer, we don't explicitly provide the version number/program name etc but it can be returned as part of the JSON for that endpoint.
Metrics
We will have an appVersion tag for Metrics. So users can then query for metrics across all runs of a particular version of the program. Metrics for an app will not be deleted until all the versions of the app is deleted which is current behavior today.
Preferences
No changes to Preferences. The hierarchy will be namespace -> app id -> program. There will be no app version dimension. Since we have runtime args, the users can use that if required.
Lineage
Audit
Metadata
TBD
Upgrade Step
Since the version used by default is "-SNAPSHOT", we will need to update the key in the Store (keys that start with appMeta) and add this default version to all the apps created. No other changes should be required.
UI Changes
We don't need any changes in the UI for 3.6.0. But for 4.0, we should display the application version along with artifact version when we show a specific Application. That is, artifact version should be displayed along with application version wherever the latter is planned to be shown.
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
AppVersion1 | Deploy an app version with existing artifact and with runtime args | Deploy succeessfully |
AppVersion2 | Start a service in an app version and call the service method | Service method should response according to the given runtime args |
AppVersion3 | Update the deployed app version with new runtime args | Update should succeed |
AppVersion4 | Start the service of the updated app which has been started before updating | Fail to start with error message saying that the same service has been started |
AppVersion5 | Stop the same service then start again, and call the same method | Start succeeds and receive response from the method according to the updated runtime args |
AppVersion6 | Deploy another app version with the same existing artifact and with different runtime args | Deploy successfully |
AppVersion7 | Start the same service in the new app version and call the service method of both versions | Both versions of the service should response according to their runtime args |
AppVersion8 | Delete an app version without stopping the running service | Fail to delete with error message saying that the service is still running |
AppVersion9 | Stop the service and delete the app version | Delete successfully |
AppVersion10 | Deploy two non-snapshot versions of the app and start the same service in both versions. Call non-versioned service endpoint for no more than 50 times | Both versions of the service should be reached for at least once within 50 calls to the non-versioned endpoint according to the random routing strategy without setting RouteConfig |
AppVersion11 | Set RouteConfig for a non-existing version | Fail to set with error message saying that the version doesn't exist |
AppVersion12 | Set RouteConfig as 99 for the only existing version | Fail to set with error message saying that the total percentage doesn't add up to 100 |
AppVersion13 | Set RouteConfig as 100 and 1 for the two existing versions | Fail to set with error message saying that the total percentage doesn't add up to 100 |
AppVersion14 | Set RouteConfig as 98 and 1 for the two existing versions | Fail to set with error message saying that the total percentage doesn't add up to 100 |
AppVersion15 | Set RouteConfig as 100 and 0 for the two existing version and call non-versioned service endpoint for 20 times | All traffic is routed to the version with RouteConfig 100 |
AppVersion16 | Set RouteConfig as 10 and 90 for the two existing versions and get RouteConfig | Set RouteConfig successfully with total percentage equal to 100 and get the same RouteConfig as set |
AppVersion17 | Set RouteConfig as 20 and 80 for the two existing versions and get RouteConfig | Set RouteConfig successfully with total percentage equal to 100 and get the same RouteConfig as set |
AppVersion18 | Set RouteConfig as 60 and 40 for the two existing versions and get RouteConfig | Set RouteConfig successfully with total percentage equal to 100 and get the same RouteConfig as set |
AppVersion19 | Delete RouteConfig and call non-versioned service endpoint for no more than 50 times | Delete successfully and both versions of the service should be reached within 50 calls to the non-versioned endpoint according to the random routing strategy with empty RouteConfig |
AppVersion20 | Delete the namespace while two versions of the app still have services running | Fail to delete with error message saying that some programs are still running |
AppVersion21 | Stop one version of the service and delete the namespace while one version of the service is running | Fail to delete with error message saying that some programs are still running |
AppVersion22 | Stop the only running service and delete the namespace | Delete successfully |
Releases
Release 3.6.0 (Drop for 9/20)
- (Internal) Change ApplicationId to contain application version (will be part of the key of Store)
- Introduce REST API to create/delete Apps with non-default versions
- Introduce REST API to start/stop programs with versions
- Upgrade step to add default version to the keys of the Store
- Check to make sure non-versioned API fail if multiple versions are present (since we need to define some behavior, we might as well implement the check) - Work will be involved in adding it to all endpoints
- Versions related endpoint - listing versions, getting app spec for a particular version (artifact id and app config etc)
- Service endpoint routing - ability to upload a config to provide a forwarding distribution logic at the Router
- Service endpoint routing when config is not present
Future Work
In scope for 4.0
- CLI
- Logging and Metrics support
- Metadata
- Lineage
- Audit
- Authorization (not required)