Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Drafts
    • User wants to add a new draft or save the pipeline he is working as a draft

    • User can update an existing draft of a pipeline as new version – previous version of pipelines are saved (upto 20 versions)
    • User can go back to previous version of draft or for any version of draft
    • User wants to retrieve the latest version of draft for a pipeline
    • User wants to view all available pipeline drafts across all users
    • User wants the ability to write a pipeline draft
    • User has access to only those pipelines that are available in the namespace the user is in.
  • Plugin Output Schema
    • User using DB-Source wants to enter connection-string, table name and automatically populate table schema information. 
    • User using TeraData-Source wants to enter connection-string, table name and automatically populate table schema information. 
  • List Field values 
    • User provides connection-string, user-name and password and expects list of available tables returned in DB-Source.

Design

Option #1

Description

The hydrator app needs to be able to write/read to a dataset to store and retrieve drafts and other information about business logic.  We can implement a Hydrator CDAP Application with a service that can have REST endpoints to serve the required hydrator functionalities. Enabling Hydrator in a namespace will deploy this Hydrator app and start the service. Hydrator UI would ping for this service to be available before coming up. The back-end business logic actions which directly needs to use the CDAP services endpoints can be made generic. 

  • Pros

    • Everything (Drafts, etc) stored in the same namespace, proper cleanup when namespace is deleted.
  • Cons

    • Every namespace will have an extra app for supporting hydrator if hydrator is enabled. Running this service, will run 2 containers per namespace. we can add an option to enable/disable hydrator if we are not using hydrator in a namespace.  It might feel weird as a user app, as the user didn't write/create this app.  

 


Option #2

Description

We will still use an Hydrator CDAP app but we create an "Extensions" namespace and have the "hydrator" app only deployed in the "extensions" namespace, this app would serve the hydrator requests for all namespaces.

It will use a single dataset to store the drafts, row keys can be name spaced for storing the drafts, while deleting the namespace, the rows belonging to the namespace will be deleted from the dataset.
  • Pros

    • Less amount of resources used, only 2 container's used rather than 2 container’s per namespace, only one dataset is used.
    • Only one app for using hydrator across namespace and not an app per namespace, less clutter. 
    • New extensions could be added to the same namespace to support other use cases in future.
  • Cons

    • Using a single dataset for storing all drafts across namespace is less secure?.
    • User won't be able to create a new namespace called "Extensions", as it will be reserved.

Open Questions

  • How to delete the drafts when the namespace is deleted ?
  • When to stop this service? 
  • Availability of the service? 
  • Security
    • If we decide to add more capability in hydrator back-end app, Eg: Make the pipeline validation/deploy app, etc,  then in secure environment, 
    • The hydrator-service can discover appropriate cdap.service and call appropriate endpoints?

Option #3 (based on discussion with terence)

1) No new user level apps are deployed. Config store is used to store user drafts of hydrator apps.

2) REST endpoint 'configure', can accept partial config and return a config response with suggestions of values for fields in a plugin, exceptions if any during configuring the plugin. 

  • user can choose a value from the suggestions for the field and call the configure again.
  • user can look at exception, fix the issue with either the script or configuration and call configure again. 
  • when all the required configs are provided and there aren't any exceptions, completionStatus would be set to true for the plugin

    #3 

    Story 1 - Schema and field value suggestions : 

    Jira Legacy
    serverCask Community Issue Tracker
    serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
    keyCDAP-5149

    Plugin annotation @Endpoint: 

    Plugin’s can have custom plugin-specific methods that can be annotated as @Endpoint.
    UI can learn about available endpoints for a plugin from the plugin properties.
    UI can call the app-fabric endpoint identifying {artifact, plugin} with the method name and method-parameter as request body,  the app-fabric endpoint will then load the corresponding plugin and call the method identified by method-name, if the method is annotated as @Endpoint.
    The response from this method call is sent as response of the HTTP request.

    REST API :

     

    HTTP Request Type

    Endpoint

    Request Body

    Response Status

    Response Body

    POST


    Code Block
    POST : /namespaces/{namespace-id}/
    configurations
    artifacts/{
    config
    artifact-
    id
    name}/

     

    {

    "config": {...}

    }

    200 OK: config saved successfully

    409 CONFLICT: draft-name already exists

    500 Error: while saving the draft

     

    PUT

    /namespaces/{namespace-id}/configurations/{config-id}/

     

    {

    "config ": {...}

    }

    200 OK: config updated successfully

    404 NOT Found : config doesn't exist already, cannot be updated.

    500 Error while updating the config

     

    GET

    /namespaces/{namespace-id}/configurations/{config-id}/

     

    200 return all the versions for the config identified by the config-name

    404 config not found

    500 error while config draft

     

    [

    {

    "timestamp" : "...",

    "config": {

    "source" : {

       ....

     },

    "transforms" : [...],

    "sinks" [...]

    "connections" : [..]

    }

    },

    ...

    ]

    GET

    /namespaces/{namespace-id}/configurations/{config-id}/versions/{version-number}

    -1 -> latest version

     

    200 return the versions for the config identified by the config-id and version-number

    404 config with version found

    500 error while getting config

     

    {

    "timestamp" : "...",

    "config": {

    "source" : {

       ....

     },

    "transforms" : [...],

    "sinks" [...]

    "connections" : [..]

    }

    }

    GET
    versions/{artifact-version}/types/{plugin-type}
    plugins/{plugin-name}/methods/{plugin-method}?scope={artifact-scope} 
    
     
    Request-Body :  JSON -  fieldName to value mapping.
     
    Response : 
    200, Successful Response JSON string
    404, Not Found, Plugin Specific Error Message (Example : DB, Table not found)
    500, Error, Plugin Specific Error Message (Example : JDBC Connection error)
     
    Description : In the request we refer to the plugin-artifact and not the parent artifact. we could use one of the available parent artifact. 

     

    Endpoint Annotation

     

    Code Block
    @Retention(RetentionPolicy.RUNTIME)
    public @interface Endpoint {
    
      /**
       * Returns the endpoint.
       */
      String endpoint();
    }

     


    Example Methods in Plugin DBSource:

     

    Code Block
    titleDBSource
    @Endpoint("listTables")
    List<String> listTables(ListTableRequest request)
    
    @Endpoint("getSchema")
    Map<String, String> getSchema(SchemaRequest request)	

     


    Story 2 - Drafts 

    Jira Legacy
    serverCask Community Issue Tracker
    serverId45b48dee-c8d6-34f0-9990-e6367dc2fe4b
    keyCDAP-5154

    Configurations HTTP Handler:

    Single HTTP Handler for unifying Console Setting Handler and Dashboards HTTP Handler. 

     

    HTTP Request Type

    Endpoint : (Table Assumes we are using config-type -> drafts)

    Request Body

    Response Status

    Response Body

    PUT

    /namespaces/{namespace-id}/configurations/{config-type}/objects/{object-id}/

    content stored as is

    200 OK: config object saved successfully

    409 CONFLICT: config with object-id already exists

    500 Error: while saving the draft

    {

    "version" : "version-id"

    }

    POST

    /namespaces/{namespace-id}/configurations/

     

    200 return the name of list of all saved configs

    500 error

    [
     "streamToTPFS",
     "DBToHBase",
      ...
    ]

    DELETE{config-type}/objects/{object-id}/versions

    content stored as is

    200 OK: config object updated successfully

    404 NOT Found : config object doesn't exist already, cannot be updated.

    500 Error while updating the config

    {

    "version" : "version-id"

    }

    GET

    /namespaces/{namespace-id}/configurations/

     

    200 successfully deleted all configs

    500 error while deleting

     

    DELETE

    /namespaces/{namespace{config-type}/objects/{object-id}/configurations/{config-id}versions

     

    200 successfully deleted the specified configreturn all the versions for the config identified by the object-id

    404 config does object not existfound

    500 error while deletinggetting config object

     

    [

     

    The ConsoleSettingsHttpHandler currently makes use of ConfigStore. It's however not name-spaced and has few other issues, it can be fixed and can be improved to store configs.

    Along with pipeline drafts ConsoleSettingsHttpHandler also stores the following information currently:

    Code Block
    titlePlugin Template Endpoints
    GET

    {

    "created" : "...",

    "version" : ".."

    },

    ...

    ]

    GET

    /namespaces/{namespace-id}/

    plugin-templates

    configurations/{

    plugin

    config-

    template-id}/  // create a new plugin template POST

    type}/objects/{object-id}/versions/{version-number}

     

     

    200 returns the specific version of the object

    404 config object with version found

    500 error while getting config object

    contents returned as is

    GET

    /namespaces/{namespace-id}/

    plugin-templates

    configurations/{

    plugin

    config-

    template-id

    type}/

    -d '@plugin-template.json' // update existing plugin template PUT namespaces/{namespace-id}/plugin-templates/{plugin-template-id}/ -d '@plugin-template.json' // delete the plugin template DELETE

    objects/{object-id}

    Get latest version

     

    200 return the latest version for the config object

    404 config object with version found

    500 error while getting the latest config object

    content returned as is

    GET

    /namespaces/{namespace-id}/

    plugin-templates

    configurations/{

    plugin

    config-

    template-id

    type}/

    objects

    Code Block
    titleDefaults
     // create/update defaults this include user's plugin version preferences, etc.
     PUT : namespaces/{namespace-id}/defaults -d '@default.json' 
     GET : namespaces/{namespace-id}/defaults 

     

    Config Store:

    Code Block
    titleExisting configstore methods
    void create(String namespace, String type, Config config) throws ConfigExistsException; void createOrUpdate(
     

    200 return the list of metadata about config objects

    500 error

    [
    {

    "name" : "StreamToTPFS",

    "lastSaved": "..",

    ..

    }

    ,
      ...
    ]

    DELETE

    /namespaces/{namespace-id}/configurations/{config-type}/objects/{object-id}

     

    200 successfully deleted the specified object

    404 object does not exist

    500 error while deleting

     

     

    "Drafts", "Plugin Templates",  "Default versions"  and "Dashboards" are type of configurations specified as  "config-type" in the REST call.

    The individual JSON-config or object would be identified by "object-id".

     

    JAVA API - Config Store:

     

    Code Block
    titleExisting configstore methods
    void create(String namespace, String type, Config config) throws ConfigExistsException;
    
    void deletecreateOrUpdate(String namespace, String type, StringConfig idconfig) throws ConfigNotFoundException;
    
    List<Config>void listdelete(String namespace, String type);, String id) throws ConfigNotFoundException;
    
    List<Config> list(String namespace, String type);
    
    Config get(String namespace, String type, String id) throws ConfigNotFoundException; 
    
    void update(String namespace, String type, Config config) throws ConfigNotFoundException;
    Code Block
    titleConfigstore new methods
    // get a particular version of an entry. 
    Config get(String namespace, String type, String id, int version) throws ConfigNotFoundException; 
    // get all the versions of an entry.
    ConfigList<Config> getAllVersions(String namespace, String type, String id) throws ConfigNotFoundException; 
    // delete all entries of specified type.
    void delete(String namespace, String type) 

     

    Questions :

    1) ConfigStore stores the configs in "config.store.table", currently the table properties doesn't have versioning, drafts would need versioning, this would also need CDAP-upgrade to update properties for the existing dataset? 

    2) rename ConsoleSettingsHttpHandler to ConfigurationsHttpHanlder ?

    REST API for configure suggestions - AppFabric :

    Request-Method : POST
    Request-Endpoint : /namespaces/{namespace-id}/apps/{app-id}/configure
    Request-Body : config-JSON 
    Code Block
    titlerequest.json
    {
        "artifact": {
            "name": "cdap-etl-batch",
            "scope": "SYSTEM",
            "version": "3.4.0-SNAPSHOT"
        },
        "name": "pipeline",
        "config": {
            "source": {
    			     "name": "Stream",
                     "plugin": {
                        "name": "StreamSource",
                        "artifact": {
                            "name": "core-plugins",
                            "version": "1.3.0-SNAPSHOT",
                            "scope": "SYSTEM"
                        },
                        "properties": {
                            "format": "syslog",
                            "name": "test",
                            "duration": "1d"
                        }
                    }
                },
             "sinks" : [{..}],
              "transform": [{..}, {...}]
            }
    }
     
    Response-Body : Config JSON
    Code Block
    titleresponse.json
    {
        "artifact": {
            "name": "cdap-etl-batch",
            "scope": "SYSTEM",
            "version": "3.4.0-SNAPSHOT"
        },
        "name": "pipeline",
        "config": {
            "source": {
    				"name": "Stream",
                    "plugin": {
                        "name": "StreamSource",
                        "artifact": {
                            "name": "core-plugins",
                            "version": "1.3.0-SNAPSHOT",
                            "scope": "SYSTEM"
                        },
                        "properties": {
                            "format": "syslog",
                            "name": "test",
                            "duration": "1d",
                            "suggestions" : [{ 
                                 "schema" : [ 
                                     { 
    								 	"ts" : "long", 
                                        "headers", "Map<String, String>", 
                                        "program", "string",
    									"message":"string",
    									"pid":"string"
    						         }
    						       ]
    							}],
    						"isComplete" : "false"
                      	}
                    }
                },
             "sinks" : [{..}],
              "transform": [{..}, {...}]
            }
    }

     

    Plugin API Change
    Code Block
    titlePipelineConfigurable
    @Beta
    public interface PipelineConfigurable {
      // change in return-type.
      ConfigResponse configurePipeline(PipelineConfigurer pipelineConfigurer) throws IllegalArgumentException; 
    }
    Code Block
    titleConfigResponse
    public class ConfigResponse extends Config {
     // list of suggestions for fields. 
     List<Suggestion> suggestions;
     // if there were any exception while executing configure 
     @Nullable
     String exception;
     // is the stage configuration complete ? 
     @DefaultValue("false")
     boolean isComplete;
    }
    Code Block
    titleSuggestion
    public class Suggestion {
    String fieldName;
    // list of possible values for the fieldName
    List<String> fieldValues; 
    } 
    Code Block
    titleApplicationContext
    @Beta
    public interface ApplicationContext<T extends Config> {
      // existing
      T getConfig();
      // application will set a config response
      void setResponseConfig(T response);
      // get the response config
      T getResponseConfig();
    }

    Questions 

    1) Though the config response makes much sense to be in ApplicationContext along with input config, since this would allow CDAP programs to set a config and read from other programs, would that be an issue?

    Schema Propagation and Validation through backend - DryRuns:

     

    Currently when pipeline is published, configurePipeline of plugins are called and we perform pipeline validation and plugin validations and also deploy the application. 

     

    1.Goal of dry-run endpoint is to validate a pipeline, then validate plugins by calling configure methods of plugin’s in the pipeline without

     

       performing any creation of datasets or generate program etc, which are usually done during deploy.

     

    2. using dry-run we would be able to catch issues in the pipeline earlier and fix them before deploying. 

     

    Dry-run can also be used by UI for schema propagation with some requirements from UI:

     

    •      If Plugin has field “schema", UI can mutate the output schema

     

    •      If plugin doesn’t have the field “schema" , UI cannot change the output schema and has to rely on result of dry               

     

                   run for the output schema for that stage, which is set during plugin configuration.

     

    we need to follow the above conditions for correctness, if UI mutates schema when there isn’t a field “schema”, the backend would have a different schema as input-schema for the next stage and the UI changes wouldn’t be reflected.

     


     

     

    Code Block
    POST : namespace/{namespace-id}/dry-run 
    
    Request-Body : JSON Config.
    
    Response-Body: 
    JSON Config with additional fields in the plugin for output schema, 
    exceptions in configuring pipeline stage, etc.  

     


    User Stories (3.5.0)

    1. For the hydrator use case, the backend app should be able to support hydrator related functionalities listed below:
    2. query for plugins available for a certain artifacts and list them in UI
    3. obtaining output schema of plugins provided the input configuration information
    4. deploying pipeline and start/stop the pipeline
    5. query the status of a pipeline run and current status of execution if there are multiple stages.
    6. get the next schedule of run, ability to query metrics and logs for the pipeline runs.
    7. creating and saving pipeline drafts
    8. get the input/output streams/datasets of the pipeline run and list them in UI. 
    9. explore the data of streams/datasets used in the pipeline if they are explorable. 
    10. Add new metadata about a pipeline and retrieve metadata by pipeline run,etc.
    11. delete hydrator pipeline
    12. the backend app's functionalities should be limited to hydrator and it shouldn't be like a proxy for CDAP.  

    Having this abilities will remove the logic in CDAP-UI to make appropriate CDAP REST calls, this encapsulation will simplify UI's interaction with the back-end and also help in debugging potential issues faster. In future, we could have more apps similar to hydrator app so our back-end app should define and implement generic cases that can be used across these apps and it should also allow extensibility to support adding new features. 

    Generic Endpoints

    ...