Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Error rendering macro 'jira' : Unable to locate Jira server for this macro. It may be due to Application Link configuration.

 

Goal : Introduce the ability in CDAP Programs to create streams, datasets or register the use of Plugins. 

Status Quo: Currently, we have the ability to create streams, datasets only at CDAP Application level. And Plugins can be registered only in Adapters (through AdapterConfigurer). So if we want to remove the concept of Application Templates and Adapters, we have couple of options:

i) Introduce the ability to register plugins in Application's configure method.

  • ETL Plugins need to have the ability to create streams/datasets. 
  • Pros: Don't need to depend on the programs to have the ability to add streams/datasets.
  • Cons: 
    • Not the intuitive place to include it. For example, in the case of ETLRealtime, ETLBatch applications, we use these features in ETLWorker/ETLMapReduce and so it would be more intuitive if we do this in programs instead of Applications
    • Will have to use Program properties to pass the plugin names to the programs to instantiate them and use them in the programs

ii)  Introduce the ability to create streams/datasets/register plugins in CDAP Programs:

  • Through the Program Configurers, users can create streams/datasets/register plugins etc in CDAP Programs (think more like local variables - create it when you need it)
  • Pros: Simplifies some applications code since create and use it only where it is needed. Simplifies ETLRealtime, ETLBatch applications. 
  • Cons: 
    • Streams/Datasets created NOT local variables since they are accessible to all applications/programs in that namespace
    • Logic to handle creation of streams/datasets (and possibly with different properties) in different  places -> should we disallow it or allow it as long as it has the same properties
    • Some ambiguous options in programs like Flows, Services, Workflows -> should we allow addition of streams/datasets in them or only in Flowlets/ServiceHandlers? (For example, stream connections are made only in Flows and not in Flowlets, even though Flows are just a collection of Flowlets)
    • WorkflowAction cannot support these changes since it uses builder pattern (there is a JIRA filed for this).
    • In some programs, certain features might not be useful. For example, creating a stream in a Service is useless since there is no way to use it programmatically in a Service/ServiceHandler

 Assuming we go with option ii), we propose the following API changes:

We will look at how the configure method of two program types will change:

 

  1. Worker:

    public class SimpleWorker extends AbstractWorker {
      @Override   
      public void configure() {
        createDataset(datasetName, KeyValueTable.class);
        addStream(new Stream("hello"));
        addDatasetModule("abcModule", ABCModule.class);
      }
    }
     

 

 

 

 

  • No labels