Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Code Block
/**
 * Contains information about a property used by a plugin.
 */
@Beta
public class PluginPropertyField {

  private final String name;
  private final String description;
  private final String type;
  private final boolean required;
  // returns true if this field can accept macro
  private final boolean macroEnabled;
  ...
}

 

Notes

This will require a CDAP platform level change as it's a new annotation.

...

During configure time in the configurePipeline method, if a field is macro enabled, the property should not be validated as the macro has not been provided a substitutable value.

Good Place to put isResolved method

We don't want to force deferring macros to runtime in the case that a field is macroable but has no macro substituted.To allow macro substitution of non-String properties, any properties configured with a macro will have a placeholder default value at configure time. For primitive types, this would be Java's default value. For objects, this would be null. This is ok because we are exposing a method to check whether a property is safe to validate at configure time through the pipelineConfigurer object.

The methods to check property safety is also helpful in allowing plugin developers to determine whether or not a dataset should be created at runtime (whether it was already created at configure time or not).

PluginConfigurer Changes:

PluginConfigurer will expose the following field

 

Code Block
public interface PluginConfigurer extends DatasetConfigurer {
  // ...
 
  boolean isMacro(String fieldName);
}

The method will return whether or not the property with the provided fieldName contained a macro at configure time. We don't want to force deferring macros to runtime in the case that a field is macroable but actually has no macro provided in its configuration. This allows optional checking of properties at configure time:

 

Code Block
@Override
void configurePipeline(PipelineConfigurer pipelineConfigurer) {
  if (!pipelineConfigurer.isMacro("name")) {
    // perform some operation using the "name" property
  }
  // ...
}

 To allow macro substitution of non-String properties, any properties configured with a macro will have a placeholder default value at configure time. For primitive types, this would be Java's default value. For objects, this would be null. This is ok because we are exposing a method to check whether a property is safe to validate at configure time through the pipelineConfigurer object.

Given that a dataset could be created at configure time if no fields are provided macros, a check should be available for plugin developers to see whether the dataset already exists at runtime. We can do this by altering the runtime context object passed in to the prepareRun method. As the object extends BatchContext which extends DatasetContext, we can create a new method in BatchContext that checks for the existence of a dataset.

 

Code Block
public interface BatchContext extends DatasetContext, TransformContext {
	// ...
 
	boolean datasetExists(String datasetName);
}

This method will return whether or not the dataset with the provided datasetName already exists and can be used in prepareRun:

Code Block
@Override
public void prepareRun(BatchSinkContext context) {
  if (!context.datasetExists(config.getName())) {
    pipelineConfigurer.createDataset(config.getName(), ...);
  }
  // ...
}

 

During runtime, PipelineInsantiator would get config fields and values to substitute and can use that information to substitute macro appropriately and return an instantiated plugin.

 

Custom Action Setting Config

values

Values:

One use case of the feature is to allow custom actions that run before a plugin to set macros. Custom actions can use workflow tokens to set values for field names.  

Code Block
"plugin": {
	"name": "Database",
	"type": "batchsource",
	"properties": {
		"user": "${wf-token(username)}",
		"password": "${secure(sql-password)}",
		"jdbcPluginName": "jdbc",         
		"importQuery": "select * from ${wf-token(table-name)};"
	}
}

If pipeline builder wants to use a workflow token sent from a preceding custom action to be used as value for fields, then he uses the macro-type token in his fields as above.

Context has access to the workflow token and we should be able to use workflow tokens similar to runtime arguments for substitution.

Scoping:

Scoping is currently at low priority and can be done manually. In our example config from a JDBC source to a table sink, there is a common macro "${table-name}", if the user wants to provide a different name for the table-name in Table Sink, he can manually do this:

SyntaxMacroEvaluates To
${table-name}table-nameemployees
${TableSink:table-name}TableSink:table-nameemployee_sql

This is more of the user creating unique argument keys as opposed to scoping.

Documentation/Changes

Regardless of where the substitution occurs, the guidelines for creating Hydrator plugins would have to change. For existing plugins, any validation for properties that are macro-substitutable in configurePipeline must be moved to prepareRun (see reference section for specific plugins). We also must document the convention for nulling/defaulting macroable properties at configure time.

Implementation Details

 

Code Block
titleMacroContext
interface MacroContext {	
	/**
	 * Given the macro key, return the substituted value
     */ 
	String getValue(String macroKey);
}

 

 

Code Block
titleMacro Types
Based on the macro type, one of the below MacroContext's will be used to get the value for macro. 
 
DefaultMacroContext implements MacroContext {
	Map<String, String> runtimeArguments;
	String getValue(String macroKey) {
		return runtimeArguments.get(macroKey);
	}
}

SecureMacroContext implements MacroContext {
	SecureStore secureStore;
	String getValue(String macroKey) {
		return secureStore.get(macroKey);
	}
}

RuntimeFunctionMacro implements MacroContext {	
	long logicalStartTime;
	Function<String, String> timezoneFunction;
	String getValue(String arguments) {
		return timezoneFunction.apply(arguments);
	}
} 


Reference

Many plugins have properties that are used in constructing or validating a schema at configure time. These fields need to have macros disabled to allow this. The following plugins and fields would be affected:

 

PluginFieldsUseConflict
BatchCassandraSourceschemaParsed for correctness to create the schema.Parsing a macro or schema with a nested macro would fail.
CopybookSourcecopybookContents

Copybook contents are converted to an InputStream and used to get external records, which are in turn used to add fields to the schema.

Schema would add macro literal as a field.
DedupAggregatoruniqueFields, filterOperationBoth fields are used to validate the input schema created.Macro literals do not exist as fields in schema and will throw IllegalArgumentException.
DistinctAggregatorfieldsSpecifies the fields used to construct the output schema.Will add macro literals as schema fields.*
GroupByAggregatorgroupByFields, aggregates,Gets fields from input schema and adds aggregates to to output fields list.Macro literals do not exist in input schema or are valid fields for an output schema.
RowDenormalizerAggregatorkeyField, nameField, valueFieldGets schemas by field names from the input schema.Macro literals do not exist as fields in the input schema.
KVTableSinkkeyField, valueFieldValidates that presence and type of these fields in the input schema.Macro literals will not exist in the input schema.
SnapshotFileBatchAvroSinkschemaParses schema to add file properties.Macro literals may disallow schema parsing or incorrect schema creation.
SnapshotFileBatchParquetSinkschemaParses schema to add file properties.Macro literals may disallow schema parsing or incorrect schema creation.
TableSinkschema, rowFieldValidates output and input schemas if properties specified.Macro literals will lead to failed validation of schema and row field.
TimePartitionedFileSetDatasetAvroSinkschemaParses schema to add file properties.Parsing macro literals in schema would fail.
TimePartitionedFileSetDatasetParquetSinkschemaParses schema to add file properties.Parsing macro literals in schema would fail.
SnapshotFileBatchAvroSourceschemaParses schema property to set output schema.Macro literals can lead to invalid schema parsing or creation.
SnapshotFileBatchParquetSourceschemaParses schema property to set output schema.Macro literals can lead to invalid schema parsing or creation.
StreamBatchSourceschema, name, formatStream is added and created through name and schema is parsed to set output schema.Macro literals will lead to bad parsing of properties.
TableSourceschemaSchema parsed to set output schema.Macro literals will lead to failed or incorrect schema creation.
TimePartitionedFileSetDatasetAvroSourceschemaSchema parsed to set output schema.Macro literals will lead to failed or incorrect schema creation.
TimePartitionedFileSetDatasetParquetSourceschemaSchema parsed to set output schema.Macro literals will lead to failed or incorrect schema creation.
JavaScriptTransformschema, script, lookupSchema format is used to set the output schema. JavaScript and lookup properties are also parsed for correctness.Macro literals can cause parsing to fail for schema creation, JavaScript compilation, or lookup parsing.
LogParserTransforminputNameGets field from input schema through inputName property.With a macro literal, the field will not exist in the input schema.
ProjectionTransformfieldsToKeep, fieldsToDrop, fieldsToConvert, fieldsToRenameProperties are used to create output schema.Macro literals will lied to a failed or wrong output schema being created.
PythonEvaluatorschemaSchema parsed for correctness and set as output schema.Macro literal can lead to failed or bad schema creation.
ValidatorTransformvalidators, validationScript,Validator property used to set validator plugins. Script property is also parsed for correctness.Macro literals can lead to failed parsing or plugins being set. Scripts can not be validated without validators.
ElasticsearchSourceschemaSchema parsed for correctness and set as output schema.Macro literals can lead to failed or incorrect schema parsing/creation.
HBaseSinkrowField, schemaParsed to valid the output and input schemas and set the ouput schema.Macro literals can lead to failed or incorrect schema parsing/creation.
HBaseSourceschemaParsed for correctness to set output schema.Macro literals can lead to failed or incorrect schema parsing/creation.
HiveBatchSourceschemaParsed for correctness to set ouput schema.Macro literals can lead to failed or incorrect schema parsing/creation.
MongoDBBatchSourceschemaParsed for correctness and validated to set output schema.Macro literals can lead to failed or incorrect schema parsing/creation.
NaiveBayesClassifierpredictionFieldConfigures and sets fields of output schema and checked for existence in input schema.Output schema would be created wrongly with macro literal as prediction field and input schema check behavior is undefined.
Compressorcompressor, schemaParsed for correctness and used to set output schema.Macro literals can lead to failed or incorrect schema parsing/creation.
CSVFormatterschemaParsed for correctness and used to set output schema.Macro literals can lead to failed or incorrect schema parsing/creation.
CSVParserfieldValidated against input schema to check existence of field.Macro literals may not exist as fields in the input schema.
Decoderdecode, schemaDecode property is parsed and validated then used to validate the input schema. Schema parsed to set output schema.Macro literals can lead to failed or incorrect schema parsing/creation or incorrect validation of input schema.
Decompressordecompressor, schemaDecompressor property is parsed and validated then used to validate the input schema. Schema parsed to set output schema.Macro literals can lead to failed or incorrect schema parsing/creation or incorrect validation of input schema.
Encoderencode, schemaEncode property is parsed and validated then used to validate the input schema. Schema parsed to set output schema.Macro literals can lead to failed or incorrect schema parsing/creation or incorrect validation of input schema.
JSONFormatterschemaParsed for correctness and used to set output schema.Macro literals can lead to failed or incorrect schema parsing/creation.
JSONParserfield, schemaValidates if field property is present in input schema. Parses schema property to set output schema.Macro literal may not exist in input schema and may lead to failed parsing or creation of output schema.
StreamFormatterschemaParsed for correctness and used to set output schema.Macro literals can lead to failed or incorrect schema parsing/creation.

 

May need verification

 


 

Other plugins have fields that are validated/processed at configure time that do not affect the schema. In these cases, these can be moved to the prepare run method. The following plugins and fields would be affected:

 

PluginFieldsUseJustification
StreamBatchSourceduration, delayParsed and validated for proper formatting.The parsing/validation is not related to the schema's creation.
TimePartitionedFileSetSourceduration, delayParsed and validated for proper formatting.The parsing/validation is not related to the schema's or dataset's creation.
ReferenceBatchSinkreferenceNameVerifies reference name meets dataset ID constraints.As dataset names can be macros, this supports the primary use case.
ReferenceBatchSourcereferenceNameVerifies that reference name meets dataset ID constraints.As dataset names can be macros, this supports the primary use case.
FileBatchSourcetimeTableCreates dataset from time table property.This is a primary use case for macros.
TimePartitionedFileSetSourcename, basePathName and basePath are used to create the dataset.This is a primary use case for macros.
BatchWritableSinkname, typeCreates dataset from properties.This is a primary use case for macros.
SnapshotFileBatchSinknameCreates dataset from name field.This is a primary use case for macros.
BatchReadableSourcename, typeDataset is created from name and type properties.This is a primary use case for macros.
SnapshotFileBatchSourceall properties*Creates dataset from properties.This is a primary use case for macros.
TimePartitionedFileSetSinkall properties*Creates dataset from properties.This is a primary use case for macros.
DBSourceimportQuery, boundingQuery, splitBy, numSplitsValidate connection settings and parsed for formatting.The parsing/validation does not lead to the creation of any schema or dataset.
HDFSSinktimeSuffixParsed to validate proper formatting of time suffix.The parsing/validation does not lead to the creation of any schema or dataset.
KafkaProducerasyncParsed to check proper formatting of boolean.The parsing/validation does not lead to the creation of any schema or dataset.
NaiveBayesClassifierfieldToClassifyChecked if input schema field is of type String.The validation does not lead to the creation or alteration of any schema.
NaiveBayesTrainerfieldToClassify, predictionFieldChecked if input schema fields are of type String and Double respectively.The validation does not lead to the creation or alteration of any schema.
CloneRecordcopiesValidated against being 0 or over the max number of copies.The validation does not lead to the creation of any schema or dataset.
CSVFormatterformatValidated for proper formatting.The validation does not lead to the creation of any schema or dataset.
CSVParserformatValidated for proper formatting.The validation does not lead to the creation of any schema or dataset.
HasherhashChecked against valid hash formats.The check does not lead to the validation or alteration of any schema.
JSONParsermappingMappings extracted and placed into a map with their expressions.The extraction does not affect any schema creation or validation.
StreamFormatterformatChecked against valid stream formats.The check does not lead to the validation or alteration of any schema.
ValueMappermapping, defaultsParsed after configuration is initialized and validated.The check does not lead to the validation or alteration of any schema.

 

May need verification


...