Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • ETL Plugins need to have the ability to create streams/datasets. 
  • Pros: Don't need to depend on the programs to have the ability to add streams/datasets.
  • Cons: 
    • Not the intuitive place to include it. For example, in the case of ETLRealtime, ETLBatch applications, we use these features in ETLWorker/ETLMapReduce and so it would be more intuitive if we do this in programs instead of Applications
    .
    • Will have to use Program properties to pass the plugin names to the programs to instantiate them and use them in the programs

ii)  Introduce the ability to create streams/datasets/register plugins in CDAP Programs:

  • Through the Program Configurers, users can create streams/datasets/register plugins etc in CDAP Programs (think more like local variables - create it when you need it)
  • Pros: Simplifies some applications code since create and use it only where it is needed. Simplifies ETLRealtime, ETLBatch applications. 
  • Cons: 
    • Streams/Datasets created NOT local variables since they are accessible to all applications/programs in that namespace
    • Logic to handle creation of streams/datasets (and possibly with different properties) in different  places -> should we disallow it or allow it as long as it has the same properties
    • Some ambiguous options in programs like Flows, Services, Workflows -> should we allow addition of streams/datasets in them or only in Flowlets/ServiceHandlers? (For example, stream connections are made only in Flows and not in Flowlets, even though Flows are just a collection of Flowlets)
    • WorkflowAction cannot support these changes since it uses builder pattern (there is a JIRA filed for this).
    • In some programs, certain features might not be useful. For example, creating a stream in a Service is useless since there is no way to use it programmatically in a Service/ServiceHandler

 Assuming we go with option ii), we propose the following API changes:

We will look at how the configure method of two program types will change:

 

  1. Worker:

    public class SimpleWorker extends AbstractWorker {
      @Override   
      public void configure() {
        createDataset(datasetName, KeyValueTable.class);
        addStream(new Stream("hello"));
        addDatasetModule("abcModule", ABCModule.class);
      }
    }