What CDAP platform provides:
API used by apps for logging the debug data
/** * Interface used by CDAP applications to log the data useful for debugging during runtime. */ public interface DebugLoggerFactory { /** * Returns {@code true} if application is running in debug mode otherwise false is returned. */ boolean isEnabled(); /** * Get the {@link DebugLogger} used to log the debug data. * @param loggerName the name of the logger with which the log data to be associated * @return the instance of the DebugLogger */ DebugLogger getLogger(String loggerName); } /** * Interface used by the CDAP applications to log the debug data. */ public interface DebugLogger { /** * Logs the data at INFO level. Multiple values can be logged against the same property. * @param propertyName the the name of the property * @param propertyValue the value associated with the property */ void info(String propertyName, Object propertyValue); /** * Logs the data at DEBUG level. Multiple values can be logged against the same property. * @param propertyName the the name of the property * @param propertyValue the value associated with the property */ void debug(String propertyName, Object propertyValue); /** * Logs the data at ERROR level. Multiple values can be logged against the same property. * @param propertyName the the name of the property * @param propertyValue the value associated with the property */ void error(String propertyName, Object propertyValue); /** * Return the name of the logger instance. */ String getName(); /** * Return the log level associated with the logger. */ DebugLogLevel getDebugLogLevel(); } public enum DebugLogLevel { INFO, ERROR, WARN }
- REST endpoints
To start a preview
POST /v3/namespaces/{namespace-id}/preview where namespace-id is the name of the namespace Response will contain the CDAP generated unique preview-id which can be used further to get the preview data.
To get the status of the preview
GET /v3/namespaces/{namespace-id}/previews/{preview-id}/status where namespace-id is the name of the namespace preview-id is the id of the preview for which status is to be requested
To get the data associated with the preview
GET /v3/namespaces/{namespace-id}/previews/{preview-id}/loggers/{logger-id} where namespace-id is the name of the namespace preview-id is the id of the preview for which data is to be requested logger-id is the unique name used to identify the logger
Platform specific CDAP configurations:
Application configuration will have preview related configurations which will be used by CDAP. Currently there are programId and programType configurations which will be used to identify the program to be executed as a part of preview. { "preview": { "programId": "MyProgram", "programType": "workflow", "logLevel": "info" } }
API used by CDAP platform to interact with the preview system:
/** * Interface used to start preview and also retrieve the information associated with a preview. */ public interface PreviewManager { /** * Start the preview of an application config provided as an input. * @param namespaceId the id of * @param config the config for the preview * @return the unique {@link PreviewId} generated for the preview run * @throws Exception if there were any error during starting */ PreviewId start(NamespaceId namespaceId, String config) throws Exception; /** * Get the status for the specified {@link PreviewId}. * @param previewId the id of the preview for which status is to be returned * @return the status associated with the preview * @throws NotFoundException if the previewId is not found */ PreviewStatus getStatus(PreviewId previewId) throws NotFoundException; /** * Stop the preview identified by previewId. * @param previewId id of the preview * @throws Exception if the previewId is not found or if there were any error during stop */ void stop(PreviewId previewId) throws Exception; /** * Get the data associated with the preview. * @param previewId the id associated with the preview * @return the {@link Map} of logger name to properties associated with the logger for a given preview * @throws NotFoundException if the previewId is not found */ Map<String, Map<String, List<String>>> getData(PreviewId previewId) throws NotFoundException; /** * Get the data associated with the specified logger name of the preview. * @param previewId id of the preview * @param loggerName the name of the logger for which data is to be returned * @return the {@link Map} of property name to property value associated with the given logger for a given preview * @throws NotFoundException if the previewId is not found */ Map<String, List<String>> getData(PreviewId previewId, String loggerName) throws NotFoundException; /** * Get the data associated with the specified logger name of the preview. * @param previewId id of the preview * @param loggerName the name of the logger for which data is to be returned * @param logLevel the log level for which data to be retrieved * @return the {@link Map} of property name to property value associated with the given logger for a given preview * @throws NotFoundException if the previewId is not found */ Map<String, List<String>> getData(PreviewId previewId, String loggerName, DebugLogLevel logLevel) throws NotFoundException; /** * Get metric associated with the preview. * @param previewId the id of the preview * @return the {@link Collection} of metrics emitted during the preview run * @throws NotFoundException if the previewId is not found */ Collection<MetricTimeSeries> getMetrics(PreviewId previewId) throws NotFoundException; /** * Get the logs for the preview. * @param previewId the id of the preview for which logs to be fetched * @return the logs * @throws NotFoundException if the previewId is not found */ List<LogEntry> getLogs(PreviewId previewId) throws NotFoundException; }
Application level capabilities:
Config changes which will be understood by the application. For hydrator following is an example of the application level preview configurations -
Consider a simple pipeline: FTP -> CSVParser -> Table { "artifact":{ "name":"cdap-data-pipeline", "version":"3.5.0-SNAPSHOT", "scope":"SYSTEM" }, "name":"MyPipeline", "config":{ "connections":[ { "from":"FTP", "to":"CSVParser" }, { "from":"CSVParser", "to":"Table" } ], "stages":[ { "name":"FTP", "plugin":{ "name":"FTP", "type":"batchsource", "label":"FTP", "artifact":{ "name":"core-plugins", "version":"1.4.0-SNAPSHOT", "scope":"SYSTEM" }, "properties":{ "referenceName":"myfile", "path":"/tmp/myfile" } }, "outputSchema":"{\"fields\":[{\"name\":\"offset\",\"type\":\"long\"},{\"name\":\"body\",\"type\":\"string\"}]}" }, { "name":"MyCSVParser", "plugin":{ "name":"CSVParser", "type":"transform", "label":"CSVParser", "artifact":{ "name":"transform-plugins", "version":"1.4.0-SNAPSHOT", "scope":"SYSTEM" }, "properties":{ "format":"DEFAULT", "schema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}", "field":"body" } }, "outputSchema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}" }, { "name":"MyTable", "plugin":{ "name":"Table", "type":"batchsink", "label":"Table", "artifact":{ "name":"core-plugins", "version":"1.4.0-SNAPSHOT", "scope":"SYSTEM" }, "properties":{ "schema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}", "name":"mytable", "schema.row.field":"id" } }, "outputSchema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}", "inputSchema":[ { "name":"id", "type":"int", "nullable":false }, { "name":"name", "type":"string", "nullable":false } ] } ], "appPreviewConfig": { "startStages": ["MyCSVParser"], "endStages": ["MyTable"], "useSinks": ["MyTable"], "outputs": { "FTP": { "data": [ {"offset": 1, "body": "100,bob"}, {"offset": 2, "body": "200,rob"}, {"offset": 3, "body": "300,tom"} ], "schema": { "type" : "record", "fields": [ {"name":"offset","type":"long"}, {"name":"body","type":"string"} ] } } } } } } Note that "appPreviewConfig" section above is the application specific configurations which will be handled by the application.
Handling application level preview configurations: Preview configurations mentioned in the above section "appPreviewConfig" are application level and are required to handle by the application.
End to End flow:
Request to the preview endpoint is given by user with the appropriate configurations. Note that the configurations will include the configs understood by CDAP and the configs understood by the app.
- CDAP will generate unique preview id for this request which is returned to the user. User can then use this preview id further to query the data generated during the preview run.
- Hydrator app will be configured based on the application configurations. For example for single stage preview configuration, we can add Worker in the app which will run the transform.
- CDAP platform will determine which program in the application is require to execute based on the preview configurations provided for CDAP.
- Based on the log level specified in the configurations, CDAP will write the preview data to the dataset.
Preview in Distributed Mode:
- Preview service will run in a separate container. The container will be started when the master is started and will keep running.
- Data generated by the preview system will be stored locally to the container. We can use the leveldb database similar to the standalone mode.
- Instances of the container can be increased for scalability, however in that case since the preview data is local to the container, request for the preview data will need to berouted to the appropriate container which handled the preview request. One approach to achieve this is store the mappings in the HBase.
- PreviewHttpHandler will be exposed through the preview container.
- Logging for preview - Preview container will use the local log appender similar to the SDK. Do we need the ability to change the log levels for preview. For example should we allow running the application in preview mode using trace log level and running the application in normal mode using info level.
- MetricsContext for preview - Querying metrics at the namespace level may not yield the entire namespace level data if the multiple instances of the preview is running.
- Authorization: We store user privileges in Sentry. User is allowed to execute the program if he has EXECUTE permissions on it. This is currently managed by AuthorizationEnforcer. We can inject same instance in the Preview container so that reading and writing to the user datasets will be controlled by privileges in the sentry.
- Impersonation: We store impersonation configurations in the Namespace meta store. NamespaceQueryAdmin is responsible for reading those configs. Preview container will need access to the instance of the query admin which will query the actual HBase table.
- Deletion of the preview data: We will need the service which will clean up the preview data periodically.
Open Questions: