...
Goals
Checklist
- User stories documented (Albert/Vinisha)
- User stories reviewed (Nitin)
- Design documented (Albert/Vinisha)
- Design reviewed (Terence/Andreas)
- Feature merged ()
- Examples and guides ()
- Integration tests ()
- Documentation for feature ()
- Blog post
Use Cases
- A pipeline developer wants to create a pipeline that has several configuration settings that are not known at pipeline creation time, but that are set at the start of the each pipeline run. For example, the time partition(s) that should be read by the source, and the name of the dataset sink, need to be set at a per-run basis. The arguments can be set either through CDAP runtime arguments/preferences, or by the pipeline itself. For example, at the start of the run, the pipeline performs some action (ex: queries a dataset or makes an http call) to lookup which time partitions should be read, and where data should be written to, for that pipeline run. Alternatively, a user can manually specify the time partitions through CDAP runtime arguments/preferences then start the run.
User Stories
- As a pipeline developer, I want to be able to configure a plugin property to some value that will get substituted for each run based on the runtime arguments
- As a pipeline operator, I want to be able to set arguments for the entire pipeline that will be used for substitution
- As a pipeline operator, I want to be able to set arguments for a specific stage in the pipeline that will be used for substitution
- As a plugin developer, I want to be able to write a code that is executed at the start of the pipeline and sets arguments for the rest of the run.
Design (WIP - dont review yet)
Specifying Macros
We can introduce macro syntax that can be used in plugin configs that the Hydrator app will substitute before any plugin code is run. For example:
Code Block |
---|
{ "stages": [ { "name": "customers", "plugin": { "name": "File", "type": "batchsource", "properties": { "path": "hdfs://host:port/${customers_inputpath}" // ${customers_inputpath} will get replaced with the value of the 'customers_inputpath' runtime argument } } }, { "name": "items", "plugin": { "name": "File", "type": "batchsource", "properties": { "path": "hdfs://host:port/${items_inputpath}" // ${items_inputpath} will get replaced with the value of the 'items_inputpath' runtime argument } } } ] } |
Setting Hydrator runtime arguments using CDAP runtime arguments/preferences
CDAP preferences and runtime arguments will be used directly as Hydrator arguments.
1.) For each stage, runtime arguments can be passed from hydrator UI . As hydrator pipeline can have multiple phases, instead of using runtime arguments from cdap, we can use preferences to store hydrator runtime arguments. Preferences for the hydrator app can be set using following cdap REST end point.
PUT <base-url>/namespaces/<namespace>/apps/<app-id>/preferences
2.using cdap REST endpoints for runtime arguments.
2.) Hydrator app will substitute properties using Macro substitution for each ETLStage. To substitute, we can use Macro api. We already have it in hydrator.
Code Block |
---|
public interface Macro { /** * Get the value of the macro based on the context and arguments. * * @param arguments arguments to the macro * @param context the macroruntime context, which gives access to things like the logical start time and runtime arguments. * @return the macro value */ String getValue(@Nullable String arguments, MacroContext context) throws Exception; } public interface MacroContext { /** * @return runtime arguments of the pipelinemacro run.value */ Map<StringString getValue(@Nullable String arguments, String>RuntimeContext getRuntimeArguments()context) throws Exception; } |
Now, substitution can can be a value which can be directly substituted or it can be key to some keystore for example, in case of SFTP. Macro substitution should vary depending on prefix of the arguments. In case of secure key, macro can be '$secure.key', in case of value directly to be substituted, macro can be '$inputpath' without any prefix.
...
- The runtime argument name should not contain '$'
- Rumtime argument name should be unique through out pipeline- No programmatic way to set RuntimeArguments for hydrator because stage.prepare() is called after instantiating a stage and to instantiate a stage, we will need stage properties.
Thoughts from Terence:
...