...
- ETL Plugins need to have the ability to create streams/datasets.
- Pros: Don't need to depend on the programs to have the ability to add streams/datasets.
- Cons:
- Not the intuitive place to include it. For example, in the case of ETLRealtime, ETLBatch applications, we use these features in ETLWorker/ETLMapReduce and so it would be more intuitive if we do this in programs instead of Applications
- Will have to use Program properties to pass the plugin names to the programs to instantiate them and use them in the programs
ii) Introduce the ability to create streams/datasets/register plugins in CDAP Programs:
- Through the Program Configurers, users can create streams/datasets/register plugins etc in CDAP Programs (think more like local variables - create it when you need it)
- Pros: Simplifies some applications code since create and use it only where it is needed. Simplifies ETLRealtime, ETLBatch applications.
- Cons:
- Streams/Datasets created NOT local variables since they are accessible to all applications/programs in that namespace
- Logic to handle creation of streams/datasets (and possibly with different properties) in different places -> should we disallow it or allow it as long as it has the same properties
- Some ambiguous options in programs like Flows, Services, Workflows -> should we allow addition of streams/datasets in them or only in Flowlets/ServiceHandlers? (For example, stream connections are made only in Flows and not in Flowlets, even though Flows are just a collection of Flowlets)
- WorkflowAction cannot support these changes since it uses builder pattern (there is a JIRA filed for this).
- In some programs, certain features might not be useful. For example, creating a stream in a Service is useless since there is no way to use it programmatically in a Service/ServiceHandler
Assuming we go with option ii), we propose the following API changes:
We will look at how the configure method of two program types will change:
- Worker:
public class SimpleWorker extends AbstractWorker {
@Override
public void configure() {
createDataset(datasetName, KeyValueTable.class);
addStream(new Stream("hello"));
addDatasetModule("abcModule", ABCModule.class);
}
}