Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • The hydrator app (SparkClientContext, MapReduceContext) will have access to secure store manager to substitute values from key store.

 

Previous Details/Design Notes:

App Level Substitution:

One possibility is for substitution to be implemented at the app level. This would be ideal if we want to keep the concept of macros Hydrator-specific. If substitution were to occur at the app level, then the user would dictate which fields will be macro-substitutable through the plugin configuration UI. In order to allow non-string properties to be substitutable, the user must provide a default value along with the macro through the UI. For example, it a user enters the "port" property as: ${port}, the UI will provide a way for the user to enter a default port value. Creating a DB batch source would yield the following configuration JSON:

Code Block
titlePlugin Config
"plugin": {
	"name": "Database",
	"type": "batchsource",
	"properties": {
		"user": "${username}",
		"password": "${secure(sql-password)}",
		"jdbcPluginName": "jdbc",
		...
		"importQuery": "select * from ${table-name};"
		...
		"macroDefaults": "{
			\"user\": \"admin\",
			\"password\": \"pw1234\",
			\"importQuery\": \"select * from test;\"
		}"
	}
}


In this case, the app understands from configuration the fields that are macros and the default values to use for those fields during configure time. 


This would require a new method in PluginContext to accept key and value pairs to substitute for plugin properties.

 

Code Block
@Beta
public interface PluginContext {
 
// existing methods
PluginProperties getPluginProperties(String pluginId);
<T> Class<T> loadPluginClass(String pluginId);
<T> T newPluginInstance(String pluginId) throws InstantiationException;
 
/**
 * Creates a new instance of a plugin. The instance returned will have the {@link PluginConfig} setup with
 * {@link PluginProperties} provided at the time when the
 * {@link PluginConfigurer#usePlugin(String, String, String, PluginProperties)} was called during the
 * program configuration time. In addition the parameter pluginProperties can be used to override the existing
 * plugin properties in config, with which the plugin instance will have substituted plugin properties.
 *
 * @param pluginId the unique identifier provide when declaring plugin usage in the program.
 * @param <T> the class type of the plugin
 * @param pluginProperties the properties to override existing plugin properties before instance creation.
 * @return A new instance of the plugin being specified by the arguments
 *
 * @throws InstantiationException if failed create a new instance
 * @throws IllegalArgumentException if pluginId is not found
 * @throws UnsupportedOperationException if the program does not support plugin
 */
<T> T newPluginInstance(String pluginId, Map<String, String> pluginProperties) throws InstantiationException;
 

 

Configure time:

The app can call this new method with macroDefault values, so plugin instance creation will use macro default values for those config fields.

Run time:

The app performs substitution for the properties with macros using the value from runtime arguments (or workflow token) and calls the method with the field names and substitution values. 

 

Scoping:

 

If the macro-substitution is performed at the DataPipeline app level, it will be possible to scope at stage name level if the user desires that. 

 

In our example config of JDBC source to Table sink, there is a common macro "${table-name}", if the user wants to provide a different name for the table-name in Table Sink, he can use scoping.

 

Code Block
Example for Scoping:

Provided runtime arguments:
 
Key : table-name, value : employees 
Key : TableSink:table-name, value : employee_sql 
 
table-name is the macro name that is used in both DBSource stage and TableSink stage. 
if user wants to provide a special value for macro "table-name" to be used in TableSink, he will prefix stage-name before the macro name separated by the delimiter (colon).



Thoughts from Terence:

 

Below are the thoughts I have so far.

 

1. Preferences/runtime arguments substitution for configuration values
  - Can start with simple $var substitution
  - The DataPipeline app performs the substitution
  - The perferences can be scoped
    - Properties prefixed with the plugin name (stage name?) will be striped
    - Property in more specific scope will override the less specific one
     - e.g. If having both "password" => "a" and "plugin1.password" => "b" in perferences, then for Plugin "plugin1", it will see "password" => "b"
  - For managing passphase so that plugin config will only contains key name, but not the actual key
  - Plugins that need sensitive information need to be adjusted to use the key management
  - Potentially can have the DataPipeline app do the substitution as well
    - But we cannot use "$", since it's used above. Maybe can be "#".
      - E.g. for plugin config {"password" => "#dbpassword"}, then at runtime the actual password with name "dbpassword" will be fetched from the secure store.


---------------------- 

Setting Hydrator runtime arguments using CDAP runtime arguments/preferences

 

CDAP preferences and runtime arguments will be used directly as Hydrator arguments. 

 

1.) Runtime arguments can be passed to hydrator pipeline in 2 ways:

 

  1. Using Prepipeline-CustomActions:
    Prepipeline custom actions can set runtime arguments. For example, before running the pipeline, custom actions can copy local files to hdfs and set runtime arguments for input path for batchsource. In order to do that, we can expose setPreferences() and getPreferences() programmatic api for setting runtime arguments. These arguments can be passed to hydrator app using workflow token. 
  2. Using Hydrator UI:
    For each stage, runtime arguments can be passed from hydrator UI using cdap REST endpoints for preferences/runtime arguments framework. 

 

2.) Hydrator app will substitute properties using Macro substitution for each ETLStage. Now, plugins, like SFTP, which need secure substitution using key management can use 'secure' prefix in the macro. Macro substitution should vary depending on prefix of the arguments. In case of secure key, macro can be '${secure(key)}', in case of value directly to be substituted, macro can be '${inputpath}' without any prefix. 

 

----------------------------



Reference:

Changes to Existing Plugins

...