Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What CDAP platform provides:

API used by apps for logging the debug data

...

languagejava

...

This document explains the separation of responsibilities between CDAP platform and Applications. 

CDAP Platform

  1. Java API: CDAP platform provides two sets of java API. External api is used by CDAP applications to interact with the preview system and internal api is used by preview REST handler.
    1. API to be used by applications:

      1. Get the instance of DebugLogger from program context. For example MapReduceContext will be updated to add new method as - 

        Code Block
        languagejava
        /**
         * MapReduce job execution context.
         */
        public interface MapReduceContext ... {
           /**
            * Return the DebugLogger
            * @param loggerName the name of the logger using which the debug information will be logged
            */
           DebugLogger getLogger(String loggerName);
        }
      2. Use DebugLogger to log the useful information.

        Code Block
        languagejava
         
        /**
         * Interface used by the CDAP applications to log the debug data.
         */
        public interface DebugLogger {
          /**
           * Logs the data at INFO level. Multiple values can be logged against the same property.
           * @param propertyName the the name of the property
           * @param propertyValue the value associated with the property
           */
          void info(String propertyName, Object propertyValue);
        
          /**
           * Return the name of the logger instance.
           */ 
          String getName();
        
          /**
           * Returns {@code true} if application is running in debug mode otherwise false is returned.
           */
          boolean isEnabled();
        }
         
        /**
         * DebugLoggerFactory will be injected in the Program context classes. This may not be directly used by Applications.
         */
        public interface DebugLoggerFactory {
          /**
           * Get the {@link DebugLogger} used to log the debug data.
           * @param loggerName the name of the logger with which the log data to be associated
           * @return the instance of the DebugLogger
           */
          DebugLogger getLogger(String loggerName);
        }
    2. API to be used by REST handler: PreviewHttpHandler will be responsible for handling the REST calls (details below). This REST handler will also interact with the preview system through API exposed by PreviewManager. Note that this is internal API.

      Code Block
      languagejava
      /**
       * Interface used to start preview and also retrieve the information associated with a preview.
       */
      public interface PreviewManager {
        /**
         * Start the program in preview mode.
         * @param namespaceId the id of the namespace
         * @param request the request for the preview. This includes details about artifact, application configs, and preview configurations used by CDAP(details below)
         * @return the unique {@link PreviewId} generated for the preview run
         * @throws Exception if there were any error during starting
         */
        ApplicationId start(NamespaceId namespaceId, AppRequest<?> request) throws Exception;
      
        /**
         * Get the status for the specified {@link ApplicationId}.
         * @param preview the id of the preview for which status is to be returned
         * @return the status associated with the preview
         * @throws NotFoundException if the preview is not found
         */
        PreviewStatus getStatus(ApplicationId preview) throws NotFoundException;
      
        /**
         * Stop the preview identified by preview.
         * @param preview id of the preview
         * @throws Exception if the preview is not found or if there were any error during stop
         */
        void stop(ApplicationId previewId) throws Exception;
      
        /**
         * Get the list of loggers in this preview.
         * @param preview id of the preview
         * @return the {@link List} of list of loggers for a given preview
         * @throws NotFoundException if the previewId is not found
         */
        List<String> getLoggers(ApplicationId previewId) throws NotFoundException;
      
        /**
         * 
    Returns
    1. Get 
    {@code
    1. the 
    true}
    1. data 
    if
    1. associated 
    application
    1. with 
    is
    1. the 
    running
    1. specified 
    in
    1. logger 
    debug
    1. name 
    mode
    1. of 
    otherwise false is returned
    1. the preview.
         *
    /
    1.  @param preview 
    boolean isEnabled(); /** * Get the {@link DebugLogger} used to log the debug data.
    1. id of the preview
         * @param loggerName the name of the logger 
    with
    1. for which 
    the
    1. data 
    log
    1. is 
    data
    1. to be 
    associated
    1. returned
         * @return the 
    instance
    1. {@link Map} of 
    the
    1. property name 
    DebugLogger
    1. to property value associated 
    */
    1. with the given 
    DebugLogger
    1. logger 
    getLogger(String loggerName); } /**
    1. for a given preview
         * 
    Interface
    1. @throws 
    used
    1. NotFoundException 
    by
    1. if the 
    CDAP
    1. preview 
    applications
    1. is 
    to
    1. not 
    log
    1. found
      
    the
    1.  
    debug
    1.  
    data.
    1.  */
      
    public
    1.  
    interface
    1.  
    DebugLogger
    1. Map<String, 
    {
    1. List<String>> getData(ApplicationId preview, 
    /**
    1. String loggerName) throws NotFoundException;
      
    *
    1.  
      
    Logs
    1. 
      
    the
    1.  
    data
    1.  
    at
    1. /**
      
    INFO
    1.  
    level.
    1.  
    Multiple
    1.  
    values
    1. * 
    can
    1. Get 
    be
    1. metric 
    logged
    1. associated 
    against
    1. with the 
    same property
    1. preview.
         * @param 
    propertyName
    1. preview the 
    the
    1. id 
    name
    1. of the 
    property
    1. preview
         *
    @param propertyValue the value associated with the property
    1.  @return the {@link Collection} of metrics emitted during the preview run
         *
    /
    1.  @throws NotFoundException 
    void
    1. if 
    info(String propertyName, Object propertyValue);
    1. the previewId is not found
         */
    **
    1. 
        Collection<MetricTimeSeries> 
    * Logs the data at DEBUG level. Multiple values can be logged against the same property
    1. getMetrics(ApplicationId preview) throws NotFoundException;
      
        /**
         * Get the logs for the preview.
         * @param 
    propertyName
    1. preview the 
    the
    1. id 
    name
    1. of the 
    property
    1. preview for which logs to 
    *
    1. be 
    @param
    1. fetched
      
    propertyValue
    1.  
    the
    1.  
    value
    1.  
    associated
    1. * 
    with
    1. @return the 
    property
    1. logs
         *
    /
    1.  @throws NotFoundException 
    void debug(String propertyName, Object propertyValue);
    1. if the preview is not found
         */
    **
    1. 
        
    * Logs the data at ERROR level. Multiple values can be logged against the same property. * @param propertyName the the name
    1. List<LogEntry> getLogs(ApplicationId preview) throws NotFoundException; 
      }
       
      // Instance of the PreviewStatus is returned by the getStatus call above. The details are as follows
      /**
       * Represents the state of the 
    property
    1. preview.
       */
      public class 
    *
    1. PreviewStatus 
    @param
    1. {
      
    propertyValue
    1.  
    the
    1.  
    value
    1. public 
    associated
    1. enum 
    with
    1. Status 
    the
    1. {
      
    property
    1.     
    */
    1. RUNNING,
        
    void
    1.  
    error(String
    1.  
    propertyName
    1. COMPLETED,
      
    Object
    1.  
    propertyValue);
    1.    DEPLOY_FAILED,
       
    /**
    1.    RUNTIME_FAILED 
    *
    1. 
      
    Return
    1.  
    the
    1.  
    name
    1. };
      
    of
    1.  
    the
    1. 
      
    logger
    1.  
    instance.
    1.  Status previewStatus;
       
    */
    1.  @Nullable
        String 
    getName()
    1. failureMessage;
      
     
    1.   
    1. //
    **
    1.  Represents the request 
    *
    1. with 
    Return
    1. which the 
    log
    1. preview 
    level associated with the logger
    1. was started.
        
    */ DebugLogLevel getDebugLogLevel()
    1. AppRequest request;
      
    1. }
       
    public enum DebugLogLevel { INFO, ERROR, WARN }REST endpoints
    1. To start a preview

      Code Block
      languagejava
      POST /v3/namespaces/{
  2. REST API exposed by platform:
    1. Start a preview

      Code Block
      languagejava
      POST /v3/namespaces/{namespace-id}/previewpreviews
      where namespace-id is the name of the namespace
      Response will contain the CDAP generated unique preview-id which can be used further to get the preview data.
    2. To get Get the status of the preview

      Code Block
      languagejava
      GET /v3/namespaces/{namespace-id}/previews/{preview-id}/status
      where namespace-id is the name of the namespace
            preview-id is the id of the preview for which status is to be requested
    3. To get the data associated with the Stop preview

      Code Block
      languagejava
      GETPOST /v3/namespaces/{namespace-id}/previews/{preview-id}/loggers/{logger-id}stop
      where namespace-id is the name of the namespace
            preview-id is the id of the preview for which data is to be requested
            loggerstopped
    4. Get the list of loggers in the preview

      Code Block
      languagejava
      GET /v3/namespaces/{namespace-id}/previews/{preview-id}/loggers
      where namespace-id is the uniquename nameof usedthe tonamespace
      identify the logger

    Platform specific CDAP configurations:

    Code Block
    languagejava
    Application
    1.  
    configuration
    1.  
    will
    1.  
    have
    1.  preview-id 
    related
    1. is 
    configurations
    1. the 
    which
    1. id 
    will
    1. of 
    be used by CDAP. Currently there are programId and programType configurations which will be used to identify the program to be executed as a part of preview. { "preview": {
    1. the preview which is to be stopped
    2. Get the data associated with the preview

      Code Block
      languagejava
      GET /v3/namespaces/{namespace-id}/previews/{preview-id}/loggers/{logger-id}
      where namespace-id is the name of the namespace
            
    "programId": "MyProgram", "programType": "workflow", "logLevel": "info" } }API used by CDAP platform to interact with the preview system:
    1. preview-id is the id of the preview
            logger-id is the id of the logger for which logs to be fetched
    2. Get the logs generated for the preview

      Code Block
      languagejava
    /** * Interface used to start preview and also retrieve the information associated with a preview. */ public interface PreviewManager { /** * Start the preview of an application config provided as an input. * @param namespaceId
    1. GET /v3/namespaces/{namespace-id}/previews/{preview-id}/logs
      where namespace-id is the name of the namespace
            preview-id is the id of the preview
    2. Get the metrics associated with the preview

      Code Block
      languagejava
      GET /v3/namespaces/{namespace-id}/previews/{preview-id}/metrics
      where namespace-id is the name of the namespace
            preview-id is the id of the preview
  3. Preview specific configurations understood by CDAP: When preview is started, CDAP needs to know which program need to be executed. Following is a sample request json - 

    Code Block
    languagejava
    {
      * @param config"artifact":{
    the config for the preview  "name":"cdap-data-pipeline",
     * @return the unique {@link PreviewId} generated for the preview run
       * @throws Exception if there were any error during starting
       */
      PreviewId start(NamespaceId namespaceId, String config) throws Exception;
    
      /**
       * Get the status for the specified {@link PreviewId}.
       * @param previewId the id of the preview for which status is to be returned
       * @return the status associated with the preview
       * @throws NotFoundException if the previewId is not found
       */
      PreviewStatus getStatus(PreviewId previewId) throws NotFoundException;
    
      /**
       * Stop the preview identified by previewId.
       * @param previewId id of the preview
       * @throws Exception if the previewId is not found or if there were any error during stop
       */
      void stop(PreviewId previewId) throws Exception;
    
      /**
       * Get the data associated with the preview.
       * @param previewId the id associated with the preview
       * @return the {@link Map} of logger name to properties associated with the logger for a given preview
       * @throws NotFoundException if the previewId is not found
       */
      Map<String, Map<String, List<String>>> getData(PreviewId previewId) throws NotFoundException;
    
      /**
       * Get the data associated with the specified logger name of the preview.
       * @param previewId id of the preview
       * @param loggerName the name of the logger for which data is to be returned
       * @return the {@link Map} of property name to property value associated with the given logger for a given preview
       * @throws NotFoundException if the previewId is not found
       */
      Map<String, List<String>> getData(PreviewId previewId, String loggerName) throws NotFoundException;
    
      /**
       * Get the data associated with the specified logger name of the preview.
       * @param previewId id of the preview
       * @param loggerName the name of the logger for which data is to be returned
       * @param logLevel the log level for which data to be retrieved
       * @return the {@link Map} of property name to property value associated with the given logger for a given preview
       * @throws NotFoundException if the previewId is not found
       */
      Map<String, List<String>> getData(PreviewId previewId, String loggerName, DebugLogLevel logLevel) throws NotFoundException;
      /**
       * Get metric associated with the preview.
       * @param previewId the id of the preview
       * @return the {@link Collection} of metrics emitted during the preview run
       * @throws NotFoundException if the previewId is not found
       */
      Collection<MetricTimeSeries> getMetrics(PreviewId previewId) throws NotFoundException;
    
      /**
       * Get the logs for the preview.
       * @param previewId the id of the preview for which logs to be fetched
       * @return the logs
       * @throws NotFoundException if the previewId is not found
       */
      List<LogEntry> getLogs(PreviewId previewId) throws NotFoundException; 
    }

Application level capabilities:

  1. Config changes which will be understood by the application. For hydrator following is an example of the application level preview configurations -

    Code Block
    languagejava
    Consider a simple pipeline: FTP -> CSVParser -> Table 
    {
        "artifact":{
          "name":"cdap-data-pipeline",
          "version":"3.5.0-SNAPSHOT",
          "scope":"SYSTEM"
        },
        "name":"MyPipeline",  
        "config":{
            "connections":[
             {
                "from":"FTP",
                "to":"CSVParser"
             },
             {
                "from":"CSVParser",
                "to":"Table"
             }
            ],
            "stages":[
             {
                "name":"FTP",
                "plugin":{
                   "name":"FTP",
                   "type":"batchsource",
                   "label":"FTP",
                   "artifact":{
                      "name":"core-plugins",
                      "version":"1.4.0-SNAPSHOT",
                      "scope":"SYSTEM"
                   },
                   "properties":{
                      "referenceName":"myfile",
                      "path":"/tmp/myfile"
                   }
                },
                "outputSchema":"{\"fields\":[{\"name\":\"offset\",\"type\":\"long\"},{\"name\":\"body\",\"type\":\"string\"}]}"
             },
             {
                "name":"MyCSVParser",
                "plugin":{
                   "name":"CSVParser",
                   "type":"transform",
                   "label":"CSVParser",
                   "artifact":{
                      "name":"transform-plugins",
                      "version":"1.4.0-SNAPSHOT",
                      "scope":"SYSTEM"
                   },
                   "properties":{
                      "format":"DEFAULT",
                      "schema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}",
                      "field":"body"
                   }
                },
                "outputSchema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}"
             },
             {
                "name":"MyTable",
                "plugin":{
                   "name":"Table",
                   "type":"batchsink",
                   "label":"Table",
                   "artifact":{
                      "name":"core-plugins",
                      "version":"1.4.0-SNAPSHOT",
                      "scope":"SYSTEM"
                   },
                   "properties":{
                      "schema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}",
                      "name":"mytable",
                      "schema.row.field":"id"
                   }
                },
                "outputSchema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}",
                "inputSchema":[
                   {
                      "name":"id",
                      "type":"int",
                      "nullable":false
                   },
                   {
                      "name":"name",
                      "type":"string",
                      "nullable":false
                   }
                ]
             }
          ],
           "appPreviewConfig": {
                   "startStages": ["MyCSVParser"],
                   "endStages": ["MyTable"],
                   "useSinks": ["MyTable"],
                   "outputs": {
                                    "FTP": {
                                            "data":   [
                                                {"offset": 1, "body": "100,bob"},
                                                {"offset": 2, "body": "200,rob"},
                                                {"offset": 3, "body": "300,tom"}
                                            ],
                                            "schema": {
                                                        "type" : "record",
                                                        "fields": [
                                                                    {"name":"offset","type":"long"},
                                                                    {"name":"body","type":"string"}
                                                                 ]
                                                      }
                                    }
                            }  
                }
    
        }
    }
     
    Note that "appPreviewConfig" section above is the application specific configurations which will be handled by the application.
  2. Handling application level preview configurations: Preview configurations mentioned in the above section "appPreviewConfig" are application level and are required to handle by the application. 

End to End flow:

  1. Request to the preview endpoint is given by user with the appropriate configurations. Note that the configurations will include the configs understood by CDAP and the configs understood by the app.

  2. CDAP will generate unique preview id for this request which is returned to the user. User can then use this preview id further to query the data generated during the preview run.
  3. Hydrator app will be configured based on the application configurations. For example for single stage preview configuration, we can add Worker in the app which will run the transform.
  4. CDAP platform will determine which program in the application is require to execute based on the preview configurations provided for CDAP.
  5. Based on the log level specified in the configurations, CDAP will write the preview data to the dataset.

Preview in Distributed Mode:

  1. Preview service will run in a separate container. The container will be started when the master is started and will keep running.
  2. Data generated by the preview system will be stored locally to the container. We can use the leveldb database similar to the standalone mode.
  3.  
    Instances of the container can be increased for scalability, however in that case since the preview data is local to the container, request for the preview data will need to be
     
    routed to the appropriate container which handled the preview request. One approach to achieve this is store the mappings in the HBase.
  4. PreviewHttpHandler will be exposed through the preview container.
  5. Logging for preview - Preview container will use the local log appender similar to the SDK. Do we need the ability to change the log levels for preview. For example should we allow running the application in preview mode using trace log level and running the application in normal mode using info level.
  6. MetricsContext for preview - Querying metrics at the namespace level may not yield the entire namespace level data if the multiple instances of the preview is running.
  7. Authorization: We store user privileges in Sentry. User is allowed to execute the program if he has EXECUTE permissions on it. This is currently managed by AuthorizationEnforcer. We can inject same instance in the Preview container so that reading and writing to the user datasets will be controlled by privileges in the sentry.
  8. Impersonation: We store impersonation configurations in the Namespace meta store. NamespaceQueryAdmin is responsible for reading those configs. Preview container will need access to the instance of the query admin which will query the actual HBase table.
  9. Deletion of the preview data: We will need the service which will clean up the preview data periodically. 

Open Questions:

 

 

...

  1.  "version":"3.5.0-SNAPSHOT",
          "scope":"SYSTEM"
        },
        "name":"MyPipeline",  
        "config":{
        ..... application specific configurations
        },
        "preview": {
          "programId": "MyProgram",
          "programType": "workflow"
        }
    }

    In the above config json, CDAP will look for "preview" key to figure out which program to be executed by preview.

Application responsibilities:

  1. Application can use the API exposed by CDAP for getting the logger and logging the data.

  2. Application specific configurations can be specified in the config section of the json. For example following are the preview related configurations for hydrator app - 

    Code Block
    languagejava
    {
        "artifact":{
          "name":"cdap-data-pipeline",
          "version":"3.5.0-SNAPSHOT",
          "scope":"SYSTEM"
        },
        "name":"MyPipeline",  
        "config":{
           "connections": {
              ...
           },
           "stages": {
              ...
           },
           "appPreviewConfig": {
              "startStages": ["MyCSVParser"], // stages from which pipeline execution is to be started
              "endStages": ["MyTable"], // stages till which pipeline need to be executed
              "useRealDatasets": ["FTP"], // list of datasets to be used from the real user space for READ only purpose
              "outputs": {
                 "FTP": {
                    "data": [
                        {"offset": 1, "body": "100,bob"},
                        {"offset": 2, "body": "200,rob"},
                        {"offset": 3, "body": "300,tom"}
                    ]
                }
              }  
           }
        },
        "preview": {
        ..... CDAP specific preview configurations    
        }
    }
  3. Handling application level preview configurations: Preview configurations mentioned in the above section "appPreviewConfig" are application level and are required to handle by the application. More details TBD.