...
- Drafts
User wants to add a new draft or save the pipeline he is working as a draft
- User can update an existing draft of a pipeline as new version – previous version of pipelines are saved (upto 20 versions)
- User can go back to previous version of draft or for any version of draft
- User wants to retrieve the latest version of draft for a pipeline
- User wants to view all available pipeline drafts across all users
- User wants the ability to write a pipeline draft
- User has access to only those pipelines that are available in the namespace the user is in.
- Plugin Output Schema
- User using DB-Source wants to enter connection-string, table name and automatically populate table schema information.
- User using TeraData-Source wants to enter connection-string, table name and automatically populate table schema information.
- List Field values
- User provides connection-string, user-name and password and expects list of available tables returned in DB-Source.
- User provides initial-address, key-space, user-name and password for key space and expects list of available column families in Cassandra.
- User provides either kafka-broker or zoo-keeper connection string and expects the list of available kafka topics.
- User provides elastic search hostname and expects the list of available indexes to choose from.
Proposed REST APIs
...
Endpoint
...
Response Status
...
/extensions/hydrator/drafts/{draft-name}
...
200 OK: draft created and saved successfully
409 CONFLICT: draft-name already exists
500 Error: while creating the draft
...
200 OK: draft updated successfully
404 NOT Found : draft doesn't exist already, cannot be updated.
500 Error while updating the draft
...
/extensions/hydrator/drafts/{draft-name}
...
200 return the draft identified by the draft-name
404 draft not found
500 error while getting draft
...
Code Block | ||
---|---|---|
| ||
{
"config": {
"source" : {
....
},
"transforms" : [...],
"sinks" [...]
"connections" : [..]
} |
...
200 return the list of all saved drafts
500 error
...
[
"streamToTPFS",
"DBToHBase",
...
]
...
200 successfully deleted all drafts
500 error while deleting
...
200 successfully deleted the specified draft
404 draft does not exist
500 error while deleting
...
/extensions/hydrator/plugins/{plugin-name}/schema
...
Code Block | ||
---|---|---|
| ||
{
"artifact" : {
"name" : "...",
"version":"...",
"scope":"..."
},
"jdbcConnectionString": ...,
"jdbcPluginName": ...,
"tableName" : ...
} |
...
200 based on the plugin and plugin-properties
determine output schema and return output schema
404 unrecognized plugin-name
500 Error
...
Code Block | ||
---|---|---|
| ||
{
"field1" : Integer,
"field2" : String,
...
"fieldN" : Double
} |
...
/extensions/hydrator/plugins/{plugin-name}/list
QueryParam : target
Example: target=table
...
Example:
Code Block | ||
---|---|---|
| ||
{
"artifact" : {
"name" : "...",
"version":"...",
"scope":"..."
}
"connectionString": ...,
"username": ...,
"password" : ...
} |
...
For the specified plugin, based on the provided connection information, get the list of available target field and return the list.
200 , list of available values for target type field. Example: list of tables in a database.
500 error while retrieving.
...
Code Block | ||
---|---|---|
| ||
[
"tableA",
"tableB"
...
"tableN"
] |
Design
Option #1
Description
The hydrator app needs to be able to write/read to a dataset to store and retrieve drafts and other information about business logic. We can implement a Hydrator CDAP Application with a service that can have REST endpoints to serve the required hydrator functionalities. Enabling Hydrator in a namespace will deploy this Hydrator app and start the service. Hydrator UI would ping for this service to be available before coming up. The back-end business logic actions which directly needs to use the CDAP services endpoints can be made generic.
Pros
- Everything (Drafts, etc) stored in the same namespace, proper cleanup when namespace is deleted.
Cons
- Every namespace will have an extra app for supporting hydrator if hydrator is enabled. Running this service, will run 2 containers per namespace. we can add an option to enable/disable hydrator if we are not using hydrator in a namespace. It might feel weird as a user app, as the user didn't write/create this app.
Option #2
Description
We will still use an Hydrator CDAP app but we create an "Extensions" namespace and have the "hydrator" app only deployed in the "extensions" namespace, this app would serve the hydrator requests for all namespaces.
Pros
- Less amount of resources used, only 2 container's used rather than 2 container’s per namespace, only one dataset is used.
- Only one app for using hydrator across namespace and not an app per namespace, less clutter.
- New extensions could be added to the same namespace to support other use cases in future.
Cons
- Using a single dataset for storing all drafts across namespace is less secure?.
- User won't be able to create a new namespace called "Extensions", as it will be reserved.
Open Questions
...
Design
Option #1
Description
The hydrator app needs to be able to write/read to a dataset to store and retrieve drafts and other information about business logic. We can implement a Hydrator CDAP Application with a service that can have REST endpoints to serve the required hydrator functionalities. Enabling Hydrator in a namespace will deploy this Hydrator app and start the service. Hydrator UI would ping for this service to be available before coming up. The back-end business logic actions which directly needs to use the CDAP services endpoints can be made generic.
Pros
- Everything (Drafts, etc) stored in the same namespace, proper cleanup when namespace is deleted.
Cons
- Every namespace will have an extra app for supporting hydrator if hydrator is enabled. Running this service, will run 2 containers per namespace. we can add an option to enable/disable hydrator if we are not using hydrator in a namespace. It might feel weird as a user app, as the user didn't write/create this app.
Option #2
Description
We will still use an Hydrator CDAP app but we create an "Extensions" namespace and have the "hydrator" app only deployed in the "extensions" namespace, this app would serve the hydrator requests for all namespaces.
Pros
- Less amount of resources used, only 2 container's used rather than 2 container’s per namespace, only one dataset is used.
- Only one app for using hydrator across namespace and not an app per namespace, less clutter.
- New extensions could be added to the same namespace to support other use cases in future.
Cons
- Using a single dataset for storing all drafts across namespace is less secure?.
- User won't be able to create a new namespace called "Extensions", as it will be reserved.
Open Questions
- How to delete the drafts when the namespace is deleted ?
- When to stop this service?
- Availability of the service?
- Security
- If we decide to add more capability in hydrator back-end app, Eg: Make the pipeline validation/deploy app, etc, then in secure environment,
- The hydrator-service can discover appropriate cdap.service and call appropriate endpoints?
Option #3
Story 1 - Schema and field value suggestions :
Jira Legacy server Cask Community Issue Tracker serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-5149
Plugin annotation @Endpoint:
REST API :
Code Block |
---|
POST : /namespaces/{namespace-id}/artifacts/{artifact-name}/versions/{artifact-version}/types/{plugin-type}
plugins/{plugin-name}/methods/{plugin-method}?scope={artifact-scope}
Request-Body : JSON - fieldName to value mapping.
Response :
200, Successful Response JSON string
404, Not Found, Plugin Specific Error Message (Example : DB, Table not found)
500, Error, Plugin Specific Error Message (Example : JDBC Connection error)
Description : In the request we refer to the plugin-artifact and not the parent artifact. we could use one of the available parent artifact. |
Code Block |
---|
@Retention(RetentionPolicy.RUNTIME)
public @interface Endpoint {
/**
* Returns the endpoint.
*/
String endpoint();
} |
Code Block | ||
---|---|---|
| ||
@Endpoint("listTables")
List<String> listTables(ListTableRequest request)
@Endpoint("getSchema")
Map<String, String> getSchema(SchemaRequest request) |
Story 2 - Drafts
Jira Legacy server Cask Community Issue Tracker serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-5154
Configurations HTTP Handler:
Single HTTP Handler for unifying Console Setting Handler and Dashboards HTTP Handler.
HTTP Request Type | Endpoint : (Table Assumes we are using config-type -> drafts) | Request Body | Response Status | Response Body | |
PUT | /namespaces/{namespace-id}/configurations/{config-type}/objects/{object-id}/ | content stored as is | 200 OK: config object saved successfully 409 CONFLICT: config with object-id already exists 500 Error: while saving the draft | { "version" : "version-id" } | |
POST | /namespaces/{namespace-id}/configurations/{config-type}/objects/{object-id}/versions | content stored as is | 200 OK: config object updated successfully 404 NOT Found : config object doesn't exist already, cannot be updated. 500 Error while updating the config | { "version" : "version-id" } | |
GET | /namespaces/{namespace-id}/configurations/{config-type}/objects/{object-id}/versions | 200 return all the versions for the config identified by the object-id 404 config object not found 500 error while getting config object |
| ||
GET | /namespaces/{namespace-id}/configurations/{config-type}/objects/{object-id}/versions/{version-number}
| 200 returns the specific version of the object 404 config object with version found 500 error while getting config object | contents returned as is | ||
GET | /namespaces/{namespace-id}/configurations/{config-type}/objects/{object-id} Get latest version | 200 return the latest version for the config object 404 config object with version found 500 error while getting the latest config object | content returned as is | ||
GET | /namespaces/{namespace-id}/configurations/{config-type}/objects | 200 return the list of metadata about config objects 500 error | [ "name" : "StreamToTPFS", "lastSaved": "..", .. } , | ||
DELETE | /namespaces/{namespace-id}/configurations/{config-type}/objects/{object-id} | 200 successfully deleted the specified object 404 object does not exist 500 error while deleting |
"Drafts", "Plugin Templates", "Default versions" and "Dashboards" are type of configurations specified as "config-type" in the REST call.
The individual JSON-config or object would be identified by "object-id".
JAVA API - Config Store:
Code Block | ||
---|---|---|
| ||
void create(String namespace, String type, Config config) throws ConfigExistsException;
void createOrUpdate(String namespace, String type, Config config);
void delete(String namespace, String type, String id) throws ConfigNotFoundException;
List<Config> list(String namespace, String type);
Config get(String namespace, String type, String id) throws ConfigNotFoundException;
void update(String namespace, String type, Config config) throws ConfigNotFoundException; |
Code Block | ||
---|---|---|
| ||
// get a particular version of an entry.
Config get(String namespace, String type, String id, int version) throws ConfigNotFoundException;
// get all the versions of an entry.
List<Config> getAllVersions(String namespace, String type, String id) throws ConfigNotFoundException;
|
Schema Propagation and Validation through backend - DryRuns:
- If Plugin has field “schema", UI can mutate the output schema
- If plugin doesn’t have the field “schema" , UI cannot change the output schema and has to rely on result of dry
Code Block |
---|
POST : namespace/{namespace-id}/dry-run
Request-Body : JSON Config.
Response-Body:
JSON Config with additional fields in the plugin for output schema,
exceptions in configuring pipeline stage, etc. |
User Stories (3.5.0)
- For the hydrator use case, the backend app should be able to support hydrator related functionalities listed below:
- query for plugins available for a certain artifacts and list them in UI
- obtaining output schema of plugins provided the input configuration information
- deploying pipeline and start/stop the pipeline
- query the status of a pipeline run and current status of execution if there are multiple stages.
- get the next schedule of run, ability to query metrics and logs for the pipeline runs.
- creating and saving pipeline drafts
- get the input/output streams/datasets of the pipeline run and list them in UI.
- explore the data of streams/datasets used in the pipeline if they are explorable.
- Add new metadata about a pipeline and retrieve metadata by pipeline run,etc.
- delete hydrator pipeline
- the backend app's functionalities should be limited to hydrator and it shouldn't be like a proxy for CDAP.
Having this abilities will remove the logic in CDAP-UI to make appropriate CDAP REST calls, this encapsulation will simplify UI's interaction with the back-end and also help in debugging potential issues faster. In future, we could have more apps similar to hydrator app so our back-end app should define and implement generic cases that can be used across these apps and it should also allow extensibility to support adding new features.
Generic Endpoints
...