...
Custom Action Setting Config values:
One use case of the feature is to allow custom actions that run before a plugin to set macros. Custom actions can use workflow tokens to set values for field names.
Code Block |
---|
"plugin": { "name": "Database", "type": "batchsource", "properties": { "user": "${wf-token(username)}", "password": "${secure(sql-password)}", "jdbcPluginName": "jdbc", "importQuery": "select * from ${wf-token(table-name)};" } } |
If pipeline builder wants to use a workflow token sent from a preceding custom action to be used as value for fields, then he uses the macro-type token in his fields as above.
Context has access to the workflow token and we should be able to use workflow tokens similar to runtime arguments for substitution.
Scoping:
Scoping will not be implemented but is currently at low priority and can be done manually. In our example config from a JDBC source to a table sink, there is a common macro "${table-name}", if the user wants to provide a different name for the table-name in Table Sink, he can manually do this:
Syntax | Macro | Evaluates To |
---|---|---|
${table-name} | table-name | employees |
${TableSink:table-name} | TableSink:table-name | employee_sql |
This is more of the user creating unique argument keys as opposed to scoping.
Documentation/Changes
Regardless of where the substitution occurs, the guidelines for creating Hydrator plugins would have to change. For existing plugins, any validation for properties that are macro-substitutable in configurePipeline must be moved to prepareRun (see reference section for specific plugins). We also must document the convention for nulling/defaulting macroable properties at configure time.
Implementation Details
Code Block | ||
---|---|---|
| ||
interface MacroContext { /** * Given the macro key, return the substituted value */ String getValue(String macroKey); } |
Code Block | ||
---|---|---|
| ||
Based on the macro type, one of the below MacroContext's will be used to get the value for macro. DefaultMacroContext implements MacroContext { Map<String, String> runtimeArguments; String getValue(String macroKey) { return runtimeArguments.get(macroKey); } } SecureMacroContext implements MacroContext { SecureStore secureStore; String getValue(String macroKey) { return secureStore.get(macroKey); } } RuntimeFunctionMacro implements MacroContext { long logicalStartTime; Function<String, String> timezoneFunction; String getValue(String arguments) { return timezoneFunction.apply(arguments); } } |
Reference
Many plugins have properties that are used in constructing or validating a schema at configure time. These fields need to have macros disabled to allow this. The following plugins and fields would be affected:
Plugin | Fields | Use | Conflict |
---|---|---|---|
BatchCassandraSource | schema | Parsed for correctness to create the schema. | Parsing a macro or schema with a nested macro would fail. |
CopybookSource | copybookContents | Copybook contents are converted to an InputStream and used to get external records, which are in turn used to add fields to the schema. | Schema would add macro literal as a field. |
DedupAggregator | uniqueFields, filterOperation | Both fields are used to validate the input schema created. | Macro literals do not exist as fields in schema and will throw IllegalArgumentException. |
DistinctAggregator | fields | Specifies the fields used to construct the output schema. | Will add macro literals as schema fields.* |
GroupByAggregator | groupByFields, aggregates, | Gets fields from input schema and adds aggregates to to output fields list. | Macro literals do not exist in input schema or are valid fields for an output schema. |
RowDenormalizerAggregator | keyField, nameField, valueField | Gets schemas by field names from the input schema. | Macro literals do not exist as fields in the input schema. |
KVTableSink | keyField, valueField | Validates that presence and type of these fields in the input schema. | Macro literals will not exist in the input schema. |
SnapshotFileBatchAvroSink | schema | Parses schema to add file properties. | Macro literals may disallow schema parsing or incorrect schema creation. |
SnapshotFileBatchParquetSink | schema | Parses schema to add file properties. | Macro literals may disallow schema parsing or incorrect schema creation. |
TableSink | schema, rowField | Validates output and input schemas if properties specified. | Macro literals will lead to failed validation of schema and row field. |
TimePartitionedFileSetDatasetAvroSink | schema | Parses schema to add file properties. | Parsing macro literals in schema would fail. |
TimePartitionedFileSetDatasetParquetSink | schema | Parses schema to add file properties. | Parsing macro literals in schema would fail. |
SnapshotFileBatchAvroSource | schema | Parses schema property to set output schema. | Macro literals can lead to invalid schema parsing or creation. |
SnapshotFileBatchParquetSource | schema | Parses schema property to set output schema. | Macro literals can lead to invalid schema parsing or creation. |
StreamBatchSource | schema, name, format | Stream is added and created through name and schema is parsed to set output schema. | Macro literals will lead to bad parsing of properties. |
TableSource | schema | Schema parsed to set output schema. | Macro literals will lead to failed or incorrect schema creation. |
TimePartitionedFileSetDatasetAvroSource | schema | Schema parsed to set output schema. | Macro literals will lead to failed or incorrect schema creation. |
TimePartitionedFileSetDatasetParquetSource | schema | Schema parsed to set output schema. | Macro literals will lead to failed or incorrect schema creation. |
JavaScriptTransform | schema, script, lookup | Schema format is used to set the output schema. JavaScript and lookup properties are also parsed for correctness. | Macro literals can cause parsing to fail for schema creation, JavaScript compilation, or lookup parsing. |
LogParserTransform | inputName | Gets field from input schema through inputName property. | With a macro literal, the field will not exist in the input schema. |
ProjectionTransform | fieldsToKeep, fieldsToDrop, fieldsToConvert, fieldsToRename | Properties are used to create output schema. | Macro literals will lied to a failed or wrong output schema being created. |
PythonEvaluator | schema | Schema parsed for correctness and set as output schema. | Macro literal can lead to failed or bad schema creation. |
ValidatorTransform | validators, validationScript, | Validator property used to set validator plugins. Script property is also parsed for correctness. | Macro literals can lead to failed parsing or plugins being set. Scripts can not be validated without validators. |
ElasticsearchSource | schema | Schema parsed for correctness and set as output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
HBaseSink | rowField, schema | Parsed to valid the output and input schemas and set the ouput schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
HBaseSource | schema | Parsed for correctness to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
HiveBatchSource | schema | Parsed for correctness to set ouput schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
MongoDBBatchSource | schema | Parsed for correctness and validated to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
NaiveBayesClassifier | predictionField | Configures and sets fields of output schema and checked for existence in input schema. | Output schema would be created wrongly with macro literal as prediction field and input schema check behavior is undefined. |
Compressor | compressor, schema | Parsed for correctness and used to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
CSVFormatter | schema | Parsed for correctness and used to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
CSVParser | field | Validated against input schema to check existence of field. | Macro literals may not exist as fields in the input schema. |
Decoder | decode, schema | Decode property is parsed and validated then used to validate the input schema. Schema parsed to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation or incorrect validation of input schema. |
Decompressor | decompressor, schema | Decompressor property is parsed and validated then used to validate the input schema. Schema parsed to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation or incorrect validation of input schema. |
Encoder | encode, schema | Encode property is parsed and validated then used to validate the input schema. Schema parsed to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation or incorrect validation of input schema. |
JSONFormatter | schema | Parsed for correctness and used to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
JSONParser | field, schema | Validates if field property is present in input schema. Parses schema property to set output schema. | Macro literal may not exist in input schema and may lead to failed parsing or creation of output schema. |
StreamFormatter | schema | Parsed for correctness and used to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
* May need verification
Other plugins have fields that are validated/processed at configure time that do not affect the schema. In these cases, these can be moved to the prepare run method. The following plugins and fields would be affected:
Plugin | Fields | Use | Justification |
---|---|---|---|
StreamBatchSource | duration, delay | Parsed and validated for proper formatting. | The parsing/validation is not related to the schema's creation. |
TimePartitionedFileSetSource | duration, delay | Parsed and validated for proper formatting. | The parsing/validation is not related to the schema's or dataset's creation. |
ReferenceBatchSink | referenceName | Verifies reference name meets dataset ID constraints. | As dataset names can be macros, this supports the primary use case. |
ReferenceBatchSource | referenceName | Verifies that reference name meets dataset ID constraints. | As dataset names can be macros, this supports the primary use case. |
FileBatchSource | timeTable | Creates dataset from time table property. | This is a primary use case for macros. |
TimePartitionedFileSetSource | name, basePath | Name and basePath are used to create the dataset. | This is a primary use case for macros. |
BatchWritableSink | name, type | Creates dataset from properties. | This is a primary use case for macros. |
SnapshotFileBatchSink | name | Creates dataset from name field. | This is a primary use case for macros. |
BatchReadableSource | name, type | Dataset is created from name and type properties. | This is a primary use case for macros. |
SnapshotFileBatchSource | all properties* | Creates dataset from properties. | This is a primary use case for macros. |
TimePartitionedFileSetSink | all properties* | Creates dataset from properties. | This is a primary use case for macros. |
DBSource | importQuery, boundingQuery, splitBy, numSplits | Validate connection settings and parsed for formatting. | The parsing/validation does not lead to the creation of any schema or dataset. |
HDFSSink | timeSuffix | Parsed to validate proper formatting of time suffix. | The parsing/validation does not lead to the creation of any schema or dataset. |
KafkaProducer | async | Parsed to check proper formatting of boolean. | The parsing/validation does not lead to the creation of any schema or dataset. |
NaiveBayesClassifier | fieldToClassify | Checked if input schema field is of type String. | The validation does not lead to the creation or alteration of any schema. |
NaiveBayesTrainer | fieldToClassify, predictionField | Checked if input schema fields are of type String and Double respectively. | The validation does not lead to the creation or alteration of any schema. |
CloneRecord | copies | Validated against being 0 or over the max number of copies. | The validation does not lead to the creation of any schema or dataset. |
CSVFormatter | format | Validated for proper formatting. | The validation does not lead to the creation of any schema or dataset. |
CSVParser | format | Validated for proper formatting. | The validation does not lead to the creation of any schema or dataset. |
Hasher | hash | Checked against valid hash formats. | The check does not lead to the validation or alteration of any schema. |
JSONParser | mapping | Mappings extracted and placed into a map with their expressions. | The extraction does not affect any schema creation or validation. |
StreamFormatter | format | Checked against valid stream formats. | The check does not lead to the validation or alteration of any schema. |
ValueMapper | mapping, defaults | Parsed after configuration is initialized and validated. | The check does not lead to the validation or alteration of any schema. |
* May need verification
...