Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Drafts
    • User wants to add a new draft or save the pipeline he is working as a draft

    • User can update an existing draft of a pipeline as new version – previous version of pipelines are saved (upto 20 versions)
    • User can go back to previous version of draft or for any version of draft
    • User wants to retrieve the latest version of draft for a pipeline
    • User wants to view all available pipeline drafts across all users
    • User wants the ability to write a pipeline draft
    • User has access to only those pipelines that are available in the namespace the user is in.
  • Plugin Output Schema
    • User using DB-Source wants to enter connection-string, table name and automatically populate table schema information. 
    • User using TeraData-Source wants to enter connection-string, table name and automatically populate table schema information. 
  • List Field values 
    • User provides connection-string, user-name and password and expects list of available tables returned in DB-Source.

 Proposed REST APIs


 

HTTP Request Type

Endpoint

Request Body

Response Status

Response BodyPOST

/extensions/hydrator/drafts/{draft-name}

Code Block
languagejs
{ 
"config": {...},
"message" : "..."
}

200 OK: draft created and saved successfully

409 CONFLICT: draft-name already exists

500 Error: while creating the draft

 PUT/extensions/hydrator/drafts/{draft-name}
Code Block
languagejs
{ 
"config": {...},
"message" : "..."
}

200 OK: draft updated successfully

404 NOT Found : draft doesn't exist already, cannot be updated.

400 BAD Request : only 20 versions can be stored, delete old version before storing.

500 Error while updating the draft

 GET

/extensions/hydrator/drafts/{draft-name}/versions/

 

200 return all the versions for the draft identified by the draft-name

404 draft not found

500 error while getting draft

Code Block
languagejs
[
{
"message" : "...",
"config": {
 "source" : {
    ....
  }, 
 "transforms" : [...],
 "sinks" [...]
 "connections" : [..]
 }
}, 
... 
]
GET

/extensions/hydrator/drafts/{draft-name}/versions/{version-number}

-1 -> latest version

 

200 return the versions for the draft identified by the draft-name and version-number

404 draft not found

500 error while getting draft

Code Block
languagejs
{ 
"message" : "...",
"config": {
 "source" : {
    ....
  }, 
 "transforms" : [...],
 "sinks" [...]
 "connections" : [..]
 }
}
GET/extensions/hydrator/drafts/ 

200 return the list of all saved drafts

500 error

[ 
  "streamToTPFS",
  "DBToHBase",
   ...
]
DELETE/extensions/hydrator/drafts/ 

200 successfully deleted all drafts

500 error while deleting

 DELETE/extensions/hydrator/drafts/{draft-name} 

200 successfully deleted the specified draft

404 draft does not exist

500 error while deleting

 DELETE/extensions/hydrator/drafts/{draft-name}/versions/{version-number} 

200 successfully deleted the version of a drat

404 draft with the version does not exist

500 error while deleting

 POST

/extensions/hydrator/plugins/{plugin-name}/schema

 

Code Block
languagejs
{ 
"artifact" : {
  "name" : "...",
  "version":"...",
  "scope":"..."
},
"jdbcConnectionString": "...", 
 "jdbcPluginName": "...", 
 "tableName" : "..."
}

 

200 based on the plugin and plugin-properties

 determine output schema and return output schema

404 unrecognized plugin-name

500 Error

Code Block
languagejs
{ 
  "field1" : Integer,
  "field2" : String,
   ...
  "fieldN" : Double
}
POST

/extensions/hydrator/plugins/{plugin-name}/list

QueryParam : target

Example: target=table

Example:

Code Block
languagejs
{ 
"artifact" : {
  "name" : "...",
  "version":"...",
  "scope":"..."
}
"connectionString":   ..., 
 "username": ..., 
 "password" : ...
}

For the specified plugin, based on the provided connection information, get the list of available target field and return the list.

200 , list of available values for target type field. Example: list of tables in a database.

500 error while retrieving.

Code Block
languagejs
[ 
  "tableA",
  "tableB"
   ...
  "tableN" 
]

Design

Option #1

Description

The hydrator app needs to be able to write/read to a dataset to store and retrieve drafts and other information about business logic.  We can implement a Hydrator CDAP Application with a service that can have REST endpoints to serve the required hydrator functionalities. Enabling Hydrator in a namespace will deploy this Hydrator app and start the service. Hydrator UI would ping for this service to be available before coming up. The back-end business logic actions which directly needs to use the CDAP services endpoints can be made generic. 

  • Pros

    • Everything (Drafts, etc) stored in the same namespace, proper cleanup when namespace is deleted.
  • Cons

    • Every namespace will have an extra app for supporting hydrator if hydrator is enabled. Running this service, will run 2 containers per namespace. we can add an option to enable/disable hydrator if we are not using hydrator in a namespace.  It might feel weird as a user app, as the user didn't write/create this app.  

 

Option #2

Description

We will still use an Hydrator CDAP app but we create an "Extensions" namespace and have the "hydrator" app only deployed in the "extensions" namespace, this app would serve the hydrator requests for all namespaces.

It will use a single dataset to store the drafts, row keys can be name spaced for storing the drafts, while deleting the namespace, the rows belonging to the namespace will be deleted from the dataset.
  • Pros

    • Less amount of resources used, only 2 container's used rather than 2 container’s per namespace, only one dataset is used.
    • Only one app for using hydrator across namespace and not an app per namespace, less clutter. 
    • New extensions could be added to the same namespace to support other use cases in future.
  • Cons

    • Using a single dataset for storing all drafts across namespace is less secure?.
    • User won't be able to create a new namespace called "Extensions", as it will be reserved.

Open Questions

  • How to delete the drafts when the namespace is deleted ?
  • When to stop this service? 
  • Availability of the service? 
  • Security
    • If we decide to add more capability in hydrator back-end app, Eg: Make the pipeline validation/deploy app, etc,  then in secure environment, 
    • The hydrator-service can discover appropriate cdap.service and call appropriate endpoints?

Option #3 (based on discussion with terence)

 No new user level apps are deployed. Preference store is used to store user drafts of hydrator apps.

'configurePipeline' can be changed to return partial results, it can return pluginSpecification with possible values for missing information in plugin config, the pluginSpecification will be serialized into applicationSpecification and returned to the user.  

Example:

  • Hydrator makes a call to Preference store to save name-spaced draft, in order to delete the drafts, delete endpoint is called in preference store for the drafts. If user deletes the namespace manually from CDAP-CLI, the preference store drops everything in that namespace including the drafts.

  • Plugin configure stage will accept incomplete config and will create PluginSpecification, with possible values for incomplete config.

  • Example : User is using a DBSource plugin, he provides connectionString, userName and password. the UI hits /validate endpoint with config, DBSource’s configurePlugin is called, it inspects the config, notices the required field ‘tableName' is missing, it connects to the database and gets the list of table names, writes this list in PluginSpecification and returns failure.

  • User notices the failure, reads the specification to get the list of tables, selects the table he is interested in and makes the same call again, DBSource’s configure plugin notices schema is missing and ‘import’ field is missing. It then populates schema information in spec and returns failure.

  • user fills the ‘import’, ‘count’ queries and changes schema appropriately and makes the same call, all the necessary fields are present and valid, the DBSource plugin returns successful for this stage. user proceeds to next stage

    Design

    Option #1

    Description

    The hydrator app needs to be able to write/read to a dataset to store and retrieve drafts and other information about business logic.  We can implement a Hydrator CDAP Application with a service that can have REST endpoints to serve the required hydrator functionalities. Enabling Hydrator in a namespace will deploy this Hydrator app and start the service. Hydrator UI would ping for this service to be available before coming up. The back-end business logic actions which directly needs to use the CDAP services endpoints can be made generic. 

    • Pros

      • Everything (Drafts, etc) stored in the same namespace, proper cleanup when namespace is deleted.
    • Cons

      • Every namespace will have an extra app for supporting hydrator if hydrator is enabled. Running this service, will run 2 containers per namespace. we can add an option to enable/disable hydrator if we are not using hydrator in a namespace.  It might feel weird as a user app, as the user didn't write/create this app.  

     


    Option #2

    Description

    We will still use an Hydrator CDAP app but we create an "Extensions" namespace and have the "hydrator" app only deployed in the "extensions" namespace, this app would serve the hydrator requests for all namespaces.

    It will use a single dataset to store the drafts, row keys can be name spaced for storing the drafts, while deleting the namespace, the rows belonging to the namespace will be deleted from the dataset.
    • Pros

      • Less amount of resources used, only 2 container's used rather than 2 container’s per namespace, only one dataset is used.
      • Only one app for using hydrator across namespace and not an app per namespace, less clutter. 
      • New extensions could be added to the same namespace to support other use cases in future.
    • Cons

      • Using a single dataset for storing all drafts across namespace is less secure?.
      • User won't be able to create a new namespace called "Extensions", as it will be reserved.

    Open Questions

    • How to delete the drafts when the namespace is deleted ?
    • When to stop this service? 
    • Availability of the service? 
    • Security
      • If we decide to add more capability in hydrator back-end app, Eg: Make the pipeline validation/deploy app, etc,  then in secure environment, 
      • The hydrator-service can discover appropriate cdap.service and call appropriate endpoints?

    Option #3 (based on discussion with terence)

     No new user level apps are deployed. Config store is used to store user drafts of hydrator apps.

    'configurePipeline' can be changed to return config suggestions, new config with suggestions is sent back to user as response.


    REST API:

     

     

    HTTP Request Type

    Endpoint

    Request Body

    Response Status

    Response Body

    POST

    /namespaces/{namespace-id}/draftsconfigurations/{draftconfig-id}/


     

    {

    "config": {...}

    }

    200 OK: draft created and config saved successfully

    409 CONFLICT: draft-name already exists

    500 Error: while creating saving the draft

     

    PUT

    /namespaces/{namespace-id}/draftsconfigurations/{draftconfig-id}/


     

    {

    "config ": {...}

    }

    200 OK: draft config updated successfully

    404 NOT Found : draft config doesn't exist already, cannot be updated.

    500 Error while updating the draftconfig

     

    GET

    /namespaces/{namespace-id}/draftsconfigurations/{draftconfig-id}/

     

    200 return all the versions for the draft config identified by the draftconfig-name

    404 draft config not found

    500 error while getting config draft


     

    [

    {

    "timestamp" : "...",

    "config": {

    "source" : {

       ....

     },

    "transforms" : [...],

    "sinks" [...]

    "connections" : [..]

    }

    },

    ...

    ]

    GET

    /namespaces/{namespace-id}/draftsconfigurations/{draftconfig-id}/versions/{version-number}

    -1 -> latest version

     

    200 return the versions for the draft config identified by the draftconfig-name id and version-number

    404 draft config with version found

    500 error while getting draftconfig


     

    {

    "timestamp" : "...",

    "config": {

    "source" : {

       ....

     },

    "transforms" : [...],

    "sinks" [...]

    "connections" : [..]

    }

    }

    GET

    /namespaces/{namespace-id}/draftsconfigurations/

     

    200 return the name of list of all saved draftsconfigs

    500 error

    [
     "streamToTPFS",
     "DBToHBase",
      ...
    ]

    DELETE

    /namespaces/{namespace-id}/draftsconfigurations/

     

    200 successfully deleted all draftsconfigs

    500 error while deleting

     

    DELETE

    /namespaces/{namespace-id}/draftsconfigurations/{draftconfig-id}

     

    200 successfully deleted the specified draftconfig

    404 draft config does not exist

    500 error while deleting

     

     

    The ConsoleSettingsHttpHandler currently makes use of ConfigStore. It's however not name-spaced and has few other issues, it can be fixed and can be improved to store configs.ConsoleSettingsHttpHandler->ConfigStore.

    Along with pipeline drafts ConsoleSettingsHttpHandler also stores the following information currently:

    1) pre-configured plugin templates. 

    Code Block
    titlePlugin Template Endpoints
    :  
    GET namespaces/{namespace-id}/plugin-templates/{plugin-template-id}/ 
    // 
    create a new plugin template
    POST namespaces/{namespace-id}/plugin-templates/{plugin-template-id}/ -d '@plugin-template.json' 
    ->
    
    
    create
    // 
    a
    update 
    new
    existing plugin template
    
    PUT namespaces/{namespace-id}/plugin-templates/{plugin-template-id}/ -d '@plugin-template.json'
    
    ->
    // 
    update
    delete 
    existing
    the plugin template
    DELETE namespaces
    
    DELETE namespaces/{namespace-id}/plugin-templates/
    {plugin-template-id}/ -> delete the plugin template
    {plugin-template-id}/ 

     

    2) default versions of plugins (user preferences)

    Endpoints: 
    PUT : namespaces/{namespace-id}/defaults -d '@default.json' ->

    Code Block
    titleDefaults
     create/update defaults this include user's plugin version preferences, etc.
    
     
    GET
    PUT : namespaces/{namespace-id}/defaults -
    > Gets defaults configured by user.
    d '@default.json' 
     GET : namespaces/{namespace-id}/defaults 

     

    ConfigStore Existing methods  :

    Code Block
    void create(String namespace, String type, Config config) throws ConfigExistsException;
    
    void createOrUpdate(String namespace, String type, Config config);
    
    void delete(String namespace, String type, String id) throws ConfigNotFoundException;
    
    List<Config> list(String namespace, String type);
    
    Config get(String namespace, String type, String id) throws ConfigNotFoundException; 
    
    void update(String namespace, String type, Config config) throws ConfigNotFoundException;

    ConfigStore new methods:

    Code Block
    Config get(String namespace, String type, String id, int version) throws ConfigNotFoundException; // get a version of a draft
    Config getAllVersions(String namespace, String type, String id) throws ConfigNotFoundException; // get all the versions of the draft. 
    void delete(String namespace, String type) // type-> drafts, delete all drafts in the namespace.

     

    Existing Config class: 

     

    Code Block
    public final class Config {
     // draft-id
     private final String id;  
     // config -> json-config and other properties, example:timestamp -> currentTime.
     private final Map<String, String> properties; 
    
    
    }


    Questions :

    1) ConfigStore stores the configs in "config.store.table", currently the table properties doesn't have versioning, drafts would need versioning, would this affect the "preferences" stored by PreferenceStore?. This would also need CDAP-upgrade to update properties for the existing dataset? 

    2) rename ConsoleSettingsHttpHandler to ConfigurationsHttpHanlder ?


    REST API for configure suggestions - AppFabric :

    Request-Method : POST
    Request-Endpoint : /namespaces/{namespace-id}/apps/{app-id}/configure
    Request-Body : config-JSON 
    Code Block
    titlerequest.json
    {
        "artifact": {
            "name": "cdap-etl-batch",
            "scope": "SYSTEM",
            "version": "3.4.0-SNAPSHOT"
        },
        "name": "pipeline",
        "config": {
            "source": {
    			     "name": "Stream",
                     "plugin": {
                        "name": "StreamSource",
                        "artifact": {
                            "name": "core-plugins",
                            "version": "1.3.0-SNAPSHOT",
                            "scope": "SYSTEM"
                        },
                        "properties": {
                            "format": "syslog",
                            "name": "test",
                            "duration": "1d"
                        }
                    }
                },
             "sinks" : [{..}],
              "transform": [{..}, {...}]
            }
    }
     
    Response-Body : Config JSON
    Code Block
    titleresponse.json
    {
        "artifact": {
            "name": "cdap-etl-batch",
            "scope": "SYSTEM",
            "version": "3.4.0-SNAPSHOT"
        },
        "name": "pipeline",
        "config": {
            "source": {
    				"name": "Stream",
                    "plugin": {
                        "name": "StreamSource",
                        "artifact": {
                            "name": "core-plugins",
                            "version": "1.3.0-SNAPSHOT",
                            "scope": "SYSTEM"
                        },
                        "properties": {
                            "format": "syslog",
                            "name": "test",
                            "duration": "1d",
                            "suggestions" : [{ 
                                 "schema" : [ 
                                     { 
    								 	"ts" : "long", 
                                        "headers", "Map<String, String>", 
                                        "program", "string",
    									"message":"string",
    									"pid":"string"
    						         }
    						       ]
    							}],
    						"isComplete" : "false"
                      	}
                    }
                },
             "sinks" : [{..}],
              "transform": [{..}, {...}]
            }
    }

     

    Plugin API Change
    Code Block
    titlePipelineConfigurable
    @Beta
    public interface PipelineConfigurable {
      // change in return-type.
      ConfigResponse configurePipeline(PipelineConfigurer pipelineConfigurer) throws IllegalArgumentException; 
    }
    Code Block
    titleConfigResponse
    public class ConfigResponse extends Config {
     // list of suggestions for fields. 
     List<Suggestion> suggestions;
     // if there were any exception while executing configure 
     @Nullable
     String exception;
     // is the stage configuration complete ? 
     @DefaultValue("false")
     boolean isComplete;
    }
    Code Block
    titleSuggestion
    public class Suggestion {
    String fieldName;
    // list of possible values for the fieldName
    List<String> fieldValues; 
    } 
    Code Block
    titleApplicationContext
    @Beta
    public interface ApplicationContext<T extends Config> {
      // existing
      T getConfig();
      // application will set a config response
      void setResponseConfig(T response);
      // get the response config
      T getResponseConfig();
    }

    Questions 

    1) Though the config response makes much sense to be in ApplicationContext along with input config, since this would allow CDAP programs to set a config and read from other programs, have to consider the implication for that.  

    User Stories (3.5.0)

    1. For the hydrator use case, the backend app should be able to support hydrator related functionalities listed below:
    2. query for plugins available for a certain artifacts and list them in UI
    3. obtaining output schema of plugins provided the input configuration information
    4. deploying pipeline and start/stop the pipeline
    5. query the status of a pipeline run and current status of execution if there are multiple stages.
    6. get the next schedule of run, ability to query metrics and logs for the pipeline runs.
    7. creating and saving pipeline drafts
    8. get the input/output streams/datasets of the pipeline run and list them in UI. 
    9. explore the data of streams/datasets used in the pipeline if they are explorable. 
    10. Add new metadata about a pipeline and retrieve metadata by pipeline run,etc.
    11. delete hydrator pipeline
    12. the backend app's functionalities should be limited to hydrator and it shouldn't be like a proxy for CDAP.  

    Having this abilities will remove the logic in CDAP-UI to make appropriate CDAP REST calls, this encapsulation will simplify UI's interaction with the back-end and also help in debugging potential issues faster. In future, we could have more apps similar to hydrator app so our back-end app should define and implement generic cases that can be used across these apps and it should also allow extensibility to support adding new features. 

    Generic Endpoints

    ...