...
To start the preview for an application:
Request Method and EndpointCode Block POST /v3/namespaces/{namespace-id}/apps/{app-id}/preview where namespace-id is the name of the namespace app-id is the name of the application for which preview is to be seen
Request body will contain the application configuration along with following additional configs for the preview
Code Block "preview": { "numRecords" : "10", "startStage" : "stage_1", "endStage" : "stage_3", "inputData" : [ {"name": "rob", "address": "san jose"}, {"name": "bob", "address": "santa clara"}, {"name": "tom", "address": "palo alto"} ] "programType": "WORKFLOW", // programType and programName can be optional for now. However in future if we want to preview non-hydrator application, then programType and programName can be provided to let preview system know which program to be previewed "programName": "DataPipelineWorkflow" } Description: 1. numRecords: Number of records to preview 2. startStage: Pipeline stage from which preview need to be started. 3. endStage: Pipeline stage till which the preview need to be run. 4. inputData: Data which need to be run through preview process. Validations to be performed: 1. startStage and endStage are connected together. 2. Schema of the inputData should match with the input schema for stage_1. 3. If SOURCE plugin is specified as startStage, preview will ignore the inputData and read the records directly from the source as specified by the numRecords. 4. If startStage is other than the Source plugin then inputData is required. Preview will process the inputData ignoring numRecords.
Consider the pipeline which has FTP source, CSV parser labeled as MyCSVParser and Table sink labeled as MyTable. The configuration with preview data will look like following:
Code Block { "artifact":{ "name":"cdap-data-pipeline", "version":"3.5.0-SNAPSHOT", "scope":"SYSTEM" }, "name":"MyPipeline", "config":{ "connections":[ { "from":"FTP", "to":"CSVParser" }, { "from":"CSVParser", "to":"Table" } ], "stages":[ { "name":"FTP", "plugin":{ "name":"FTP", "type":"batchsource", "label":"FTP", "artifact":{ "name":"core-plugins", "version":"1.4.0-SNAPSHOT", "scope":"SYSTEM" }, "properties":{ "referenceName":"myfile", "path":"/tmp/myfile" } }, "outputSchema":"{\"fields\":[{\"name\":\"offset\",\"type\":\"long\"},{\"name\":\"body\",\"type\":\"string\"}]}" }, { "name":"MyCSVParser", "plugin":{ "name":"CSVParser", "type":"transform", "label":"CSVParser", "artifact":{ "name":"transform-plugins", "version":"1.4.0-SNAPSHOT", "scope":"SYSTEM" }, "properties":{ "format":"DEFAULT", "schema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}", "field":"body" } }, "outputSchema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}" }, { "name":"MyTable", "plugin":{ "name":"Table", "type":"batchsink", "label":"Table", "artifact":{ "name":"core-plugins", "version":"1.4.0-SNAPSHOT", "scope":"SYSTEM" }, "properties":{ "schema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}", "name":"mytable", "schema.row.field":"id" } }, "outputSchema":"{\"type\":\"record\",\"name\":\"etlSchemaBody\",\"fields\":[{\"name\":\"id\",\"type\":\"int\"},{\"name\":\"name\",\"type\":\"string\"}]}", "inputSchema":[ { "name":"id", "type":"int", "nullable":false }, { "name":"name", "type":"string", "nullable":false } ] } ], "preview": { "numRecords" : "10", "startStage" : "MyCSVParser", "endStage" : "MyTable", "inputData" : [ {"offset": 1, "body": "100,bob"}, {"offset": 2, "body": "200,rob"}, {"offset": 3, "body": "300,tom"} ] } } }
The above preview configuration will read the inputData from preview section and write the data to MyTable. If user does not want to write the data to the sink but only want to preview the MyCSVParser stage the preview configurations will look like below:
Code Block "preview": { "numRecords" : "10", "startStage" : "MyCSVParser", "endStage" : "MyCSVParser", // In order to execute single stage start stage is same as end stage "inputData" : [ {"offset": 1, "body": "100,bob"}, {"offset": 2, "body": "200,rob"}, {"offset": 3, "body": "300,tom"} ] }
- Once the preview is started, the unique preview id will be generated for it. Preview id could be of the form: namespace_id.app_id.preview. The runtime information (<Preview_id, STATUS) for the preview will be generated and will be stored (in-memory or disk). While preview is running if user again sends the preview request for the same application and the STATUS is RUNNING, user will get 403 status code with "Preview already running for application." error message. This enforces only one preview is running at any instance of time for a given application in a given namespace.
- If the startStage in the preview configurations is not SOURCE plugin, then the preview system will generate the MOCK source in the pipeline which will read the JSON records specified in the inputData field and convert them into the StructureRecord.
- Once the preview execution is complete, its runtime information will be updated with the status of the preview (COMPLETED or FAILED).
To get the status of the preview
Request Method and EndpointCode Block GET /v3/namespaces/{namespace-id}/apps/{app-id}/preview/status where namespace-id is the name of the namespace app-id is the name of the application for which preview data is to be requested
Response body will contain JSON encoded preview status and optional message if the preview failed.
Code Block 1. If preview is RUNNING { "status": "RUNNING" } 2. If preview is COMPLETED { "status": "COMPLETED" } 3. If preview FAILED { "status": "FAILED" "errorMessage": "Preview failure root cause message." }
To get the preview data for stage:
Request Method and EndpointCode Block GET /v3/namespaces/{namespace-id}/apps/{app-id}/preview/stages/{stage-name} where namespace-id is the name of the namespace app-id is the name of the application for which preview data is to be requested stage-name is the unique name used to identify the stage
Response body will contain JSON encoded input data and output data for the stage as well as input and output schema.
Code Block { "inputData": [ {"first_name": "rob", "zipcode": 95131}, {"first_name": "bob", "zipcode": 95054}, {"first_name": "tom", "zipcode": 94306} ], "outputData":[ {"name": "rob", "zipcode": 95131, "age": 21}, {"name": "bob", "zipcode": 95054, "age": 22}, {"name": "tom", "zipcode": 94306, "age": 23} ], "inputSchema": { "type":"record", "name":"etlSchemaBody", "fields":[ {"name":"first_name", "type":"string"}, {"name":"zipcode", "type":"int"} ] }, "outputSchema": { "type":"record", "name":"etlSchemaBody", "fields":[ {"name":"name", "type":"string"}, {"name":"zipcode", "type":"int"}, {"name":"age", "type":"int"} ] } }
To get the logs/metrics for the preview:
Request Method and EndpointCode Block GET /v3/namespaces/{namespace-id}/apps/{app-id}/preview/logs GET /v3/namespaces/{namespace-id}/apps/{app-id}/preview/metric where namespace-id is the name of the namespace app-id is the name of the application for which preview data is to be requested
Response would be similar to the regular app.