...
Currently when we deploy a pipeline, configurePipeline is called on each plugin. we perform few validations in the configure stage, specifically for schema strings, syntax for scripts, etc. In some Plugins we also create a dataset if the dataset doesn't already exist.
The dataset to write to can be macro-substituted. , so we have to defer dataset creation to prepareRun rather than doing at this in the configure stage.
Deferring dataset creation in prepareRun will required changes to BatchSinkContext to have require adding a new method to BatchSinkContext.
Code Block |
---|
@Beta
public interface BatchSinkContext extends BatchContext {
// new method
void createDataset(String datasetName, String typeName, DatasetProperties properties);
// existing methods
@Deprecated
void addOutput(String datasetName);
...
} |
Currently if a stream given in stream source or table given in table source doesn't exist, we create a new stream/table, we . We want to disallow this, so this addition will only be in BatchSinkContext and not BatchContext.
...
There are two way of handling this.
1) At Platform Levelthe platform level
2) At the DataPipeline App app level
Platform Level Substitution:
Plugins can use an "@Macro" annotation to specify if a plugin field can be a macro and also provides a configure-value to use at configure time to instantiate the plugin.
When a plugin instance is instantiated at configure time, macros cannot be substituted as the values to substitute have not been specified yet.
- If we want to keep the field with the macro as is, then the field has to always be a string, this is limiting for plugin developers. as they have to do type casting and parsing themselves for using macro on fields with other types than String.
- By having a configure-value we can work - around that, so the . The plugin developer has to know that this value will be used at configure time. but this might seem unnecessary for the plugin developer as this configure-value isn't very useful except to avoid failure at configure time.
...