...
- The hydrator app (SparkClientContext, MapReduceContext) will have access to secure store manager to substitute values from key store.
Reference:
Changes to Existing Plugins
Many plugins have fields (configurable properties) that are used in constructing or validating a schema at configure time. These fields need to have macros disabled to allow this. The following plugins and fields would be affected:
Plugin | Fields | Use | Conflict |
---|---|---|---|
BatchCassandraSource | schema | Parsed for correctness to create the schema. | Parsing a macro or schema with a nested macro would fail. |
CopybookSource | copybookContents | Copybook contents are converted to an InputStream and used to get external records, which are in turn used to add fields to the schema. | Schema would add macro literal as a field. |
DedupAggregator | uniqueFields, filterOperation | Both fields are used to validate the input schema created. | Macro literals do not exist as fields in schema and will throw IllegalArgumentException. |
DistinctAggregator | fields | Specifies the fields used to construct the output schema. | Will add macro literals as schema fields.* |
GroupByAggregator | groupByFields, aggregates, | Gets fields from input schema and adds aggregates to to output fields list. | Macro literals do not exist in input schema or are valid fields for an output schema. |
RowDenormalizerAggregator | keyField, nameField, valueField | Gets schemas by field names from the input schema. | Macro literals do not exist as fields in the input schema. |
KVTableSink | keyField, valueField | Validates that presence and type of these fields in the input schema. | Macro literals will not exist in the input schema. |
SnapshotFileBatchAvroSink | schema | Parses schema to add file properties. | Macro literals may disallow schema parsing or incorrect schema creation. |
SnapshotFileBatchParquetSink | schema | Parses schema to add file properties. | Macro literals may disallow schema parsing or incorrect schema creation. |
TableSink | schema, rowField | Validates output and input schemas if properties specified. | Macro literals will lead to failed validation of schema and row field. |
TimePartitionedFileSetDatasetAvroSink | schema | Parses schema to add file properties. | Parsing macro literals in schema would fail. |
TimePartitionedFileSetDatasetParquetSink | schema | Parses schema to add file properties. | Parsing macro literals in schema would fail. |
SnapshotFileBatchAvroSource | schema | Parses schema property to set output schema. | Macro literals can lead to invalid schema parsing or creation. |
SnapshotFileBatchParquetSource | schema | Parses schema property to set output schema. | Macro literals can lead to invalid schema parsing or creation. |
StreamBatchSource | schema, name, format | Stream is added and created through name and schema is parsed to set output schema. | Macro literals will lead to bad parsing of properties. |
TableSource | schema | Schema parsed to set output schema. | Macro literals will lead to failed or incorrect schema creation. |
TimePartitionedFileSetDatasetAvroSource | schema | Schema parsed to set output schema. | Macro literals will lead to failed or incorrect schema creation. |
TimePartitionedFileSetDatasetParquetSource | schema | Schema parsed to set output schema. | Macro literals will lead to failed or incorrect schema creation. |
JavaScriptTransform | schema, script, lookup | Schema format is used to set the output schema. JavaScript and lookup properties are also parsed for correctness. | Macro literals can cause parsing to fail for schema creation, JavaScript compilation, or lookup parsing. |
LogParserTransform | inputName | Gets field from input schema through inputName property. | With a macro literal, the field will not exist in the input schema. |
ProjectionTransform | fieldsToKeep, fieldsToDrop, fieldsToConvert, fieldsToRename | Properties are used to create output schema. | Macro literals will lied to a failed or wrong output schema being created. |
PythonEvaluator | schema | Schema parsed for correctness and set as output schema. | Macro literal can lead to failed or bad schema creation. |
ValidatorTransform | validators, validationScript, | Validator property used to set validator plugins. Script property is also parsed for correctness. | Macro literals can lead to failed parsing or plugins being set. Scripts can not be validated without validators. |
ElasticsearchSource | schema | Schema parsed for correctness and set as output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
HBaseSink | rowField, schema | Parsed to valid the output and input schemas and set the ouput schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
HBaseSource | schema | Parsed for correctness to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
HiveBatchSource | schema | Parsed for correctness to set ouput schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
MongoDBBatchSource | schema | Parsed for correctness and validated to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
NaiveBayesClassifier | predictionField | Configures and sets fields of output schema and checked for existence in input schema. | Output schema would be created wrongly with macro literal as prediction field and input schema check behavior is undefined. |
Compressor | compressor, schema | Parsed for correctness and used to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
CSVFormatter | schema | Parsed for correctness and used to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
CSVParser | field | Validated against input schema to check existence of field. | Macro literals may not exist as fields in the input schema. |
Decoder | decode, schema | Decode property is parsed and validated then used to validate the input schema. Schema parsed to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation or incorrect validation of input schema. |
Decompressor | decompressor, schema | Decompressor property is parsed and validated then used to validate the input schema. Schema parsed to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation or incorrect validation of input schema. |
Encoder | encode, schema | Encode property is parsed and validated then used to validate the input schema. Schema parsed to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation or incorrect validation of input schema. |
JSONFormatter | schema | Parsed for correctness and used to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
JSONParser | field, schema | Validates if field property is present in input schema. Parses schema property to set output schema. | Macro literal may not exist in input schema and may lead to failed parsing or creation of output schema. |
StreamFormatter | schema | Parsed for correctness and used to set output schema. | Macro literals can lead to failed or incorrect schema parsing/creation. |
* May need verification
Other plugins have fields that are validated/processed at configure time that do not affect the schema. In these cases, these can be moved to the prepare run method. The following plugins and fields would be affected:
Plugin | Fields | Use | Justification |
---|---|---|---|
StreamBatchSource | duration, delay | Parsed and validated for proper formatting. | The parsing/validation is not related to the schema's creation. |
TimePartitionedFileSetSource | duration, delay | Parsed and validated for proper formatting. | The parsing/validation is not related to the schema's or dataset's creation. |
ReferenceBatchSink | referenceName | Verifies reference name meets dataset ID constraints. | As dataset names can be macros, this supports the primary use case. |
ReferenceBatchSource | referenceName | Verifies that reference name meets dataset ID constraints. | As dataset names can be macros, this supports the primary use case. |
FileBatchSource | timeTable | Creates dataset from time table property. | This is a primary use case for macros. |
TimePartitionedFileSetSource | name, basePath | Name and basePath are used to create the dataset. | This is a primary use case for macros. |
BatchWritableSink | name, type | Creates dataset from properties. | This is a primary use case for macros. |
SnapshotFileBatchSink | name | Creates dataset from name field. | This is a primary use case for macros. |
BatchReadableSource | name, type | Dataset is created from name and type properties. | This is a primary use case for macros. |
SnapshotFileBatchSource | all properties* | Creates dataset from properties. | This is a primary use case for macros. |
TimePartitionedFileSetSink | all properties* | Creates dataset from properties. | This is a primary use case for macros. |
DBSource | importQuery, boundingQuery, splitBy, numSplits | Validate connection settings and parsed for formatting. | The parsing/validation does not lead to the creation of any schema or dataset. |
HDFSSink | timeSuffix | Parsed to validate proper formatting of time suffix. | The parsing/validation does not lead to the creation of any schema or dataset. |
KafkaProducer | async | Parsed to check proper formatting of boolean. | The parsing/validation does not lead to the creation of any schema or dataset. |
NaiveBayesClassifier | fieldToClassify | Checked if input schema field is of type String. | The validation does not lead to the creation or alteration of any schema. |
NaiveBayesTrainer | fieldToClassify, predictionField | Checked if input schema fields are of type String and Double respectively. | The validation does not lead to the creation or alteration of any schema. |
CloneRecord | copies | Validated against being 0 or over the max number of copies. | The validation does not lead to the creation of any schema or dataset. |
CSVFormatter | format | Validated for proper formatting. | The validation does not lead to the creation of any schema or dataset. |
CSVParser | format | Validated for proper formatting. | The validation does not lead to the creation of any schema or dataset. |
Hasher | hash | Checked against valid hash formats. | The check does not lead to the validation or alteration of any schema. |
JSONParser | mapping | Mappings extracted and placed into a map with their expressions. | The extraction does not affect any schema creation or validation. |
StreamFormatter | format | Checked against valid stream formats. | The check does not lead to the validation or alteration of any schema. |
ValueMapper | mapping, defaults | Parsed after configuration is initialized and validated. | The check does not lead to the validation or alteration of any schema. |
* May need verification