Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 12 Next »

  1. Public interface to emit the preview data:

    public interface PreviewEmitter<V>  {
    	
    	/**
     	 * Emit the Map of properties corresponding to the given key.
     	 * @param key the key under which properties are stored
    	 * @param propertyValues the map of property values to be stored under the given key
     	 */
    	 void emit(String key, Map<String, List<V>> propertyValues);
     
    	/**
     	 * Emit the property specified by name and value for the given key.
     	 * @param key the key under which properties are stored
         * @param propertyName the the name of the property
    	 * @param propertyValue the value associated with the property
     	 */
    	 void emit(String key, String propertyName, V propertyValue);
    }
  2. How the application will get access to the PreviewEmitter? Similar to Metric and @UseDataSet, instance of the PreviewEmitter will be injected by CDAP in the application.

    public class MyMapReduce extends AbstractMapReduce {
    	private PreviewEmitter<String> emitter;
     
        @Override
        public void initialize() throws Exception {
    		emitter.emit("MyMapReduce.initialize", "logical.start.time", getContext().getLogicalStartTime().toString());
    		emitter.emit("MyMapReduce.initialize", "actual.start.time", System.currentTimeMillis().toString());
        }
     
        public MyMapper extends Mapper<byte[], Text, Text, Text> {
    		@Override
    		public void map(byte[] key, Text value, Context context) throws IOException {
    			if (value.toString().startsWith("Product") {
    				emitter.emit("MapReduce.map", "map.product", value.toString());	
    			}
    		}
    	}
    } 
  3. How will CDAP get data from the preview?

    /**
     * Represents the state of the preview.
     */
    public class PreviewState {
      public enum Status {
    	RUNNING,
    	COMPLETED,
    	DEPLOY_FAILED,
    	RUNTIME_FAILED	
      };
     
      Status previewStatus;
      @Nullable	
      String failureMessage;			
    }
     
    // This is internal interface which will be used by REST handlers
    // to retrieve the preview information.
    public interface PreviewInfo {
     
        /**
    	 * Get the state of the preview represented by previewId.
         */
    	PreviewState getStatus(PreviewId previewId);
     
    	/**
    	 * Get the data associated with the preview represented by previewId.
    	 */
        Map<String, Map<String, List<V>> getData(PreviewId previewId);
     
    	/**
    	 * Get all metrics associated with the preview represented by previewId.
    	 */
    	Collection<MetricTimeSeries> getMetrics(PreviewId previewId);
      
     	/**
    	 * Get all logs associated with the preview represented by previewId.
    	 */
    	String getLogs(PreviewId previewId);
    }
     
    class PreviewId {
    	NamespaceId namespace;
        String preview;
    }

SDK:

Preview Execution Isolation:

Requirement:

  1. We want the program runs we execute, datasets created during preview for preview purpose, logs and metrics emitted during preview to be isolated from the regular Standalone execution which is used to publish and run the pipeline.
  2. In Preview, pipeline could have lookup datasets in a transform which reads from the datasets in Standalone. so we want a way to share datasets in preview with datasets in standalone. 
  3. In Preview, we want to skip writing meta data and lineage information as they are unnecessary. 

Preview Injector vs Standalone Injector:

ServiceStandalone (Yes/No)Preview (Yes/No)
userInterfaceService
YesNo
trackerAppCreationService
YesNo
router
YesNo
streamService
YesYes
exploreExecutorService
YesNo
exploreClient
YesNo
metadataService
YesNo
serviceStore (set/get service instances)
YesNo
appFabricServer
YesNo
previewServer
NoYes
datasetService
YesYes
metricsQueryService
YesNo (Can call MetricStore query)
txService
YesYes
externalAuthenticationServer (if security enabled)
YesYes
logAppenderInitializer
YesYes
kafkaClient(if audit enabled)
YesNo
zkClient (if audit enabled)
YesNo
authorizerInstantiator (started by default)
YesYes?

 

AppFabricServer vs PreviewServer :

This is a subset of services started in app-fabric server.

ServicesAppFabricServerPreviewServer
notificationService
YesNo
schedulerService
YesNo
applicationLifecycleService
YesYes
systemArtifactLoader
YesYes
programRuntimeService
YesYes
streamCoordinatorClient
YesYes
programLifecycleService
YesYes
pluginService
YesYes
httpService
YesYes (but only with preview handler).

 

 

PreviewDatasetFramework

Requirements:

1) Pipeline want's to read from a dataset source (or) pipeline wants to write to a dataset sink (or) transform uses a lookup table. These datasets are in CDAP Standalone space.

2) Pipeline run's records, Pipeline run metrics, program status, etc are stored in System datasets in Preview space.

3) Error dataset : Its not clear if using error dataset should cause creating an error dataset in CDAP standalone space. I feel it might not be required to created in Standalone space. In which case if its a dataset then it's the only user level dataset that has to be created in Preview space, we can say we would have an in-memory implementation for maintaining error records.

 

Assumptions :

1) All Datasets in System Namespace will be using the "LocalDatasetFramework"

2) All Datasets in User's Namespaces will be using the "RemoteDatasetFramework"

 

PreviewDatasetFramework
... snippet
@Nullable
@Override
public <T extends Dataset> T getDataset(Id.DatasetInstance datasetInstanceId,
                                        @Nullable Map<String, String> arguments,
                                        @Nullable ClassLoader classLoader)
  throws DatasetManagementException, IOException {
  if (datasetInstanceId.getNamespace().equals(Id.Namespace.SYSTEM)) {
    return localDatasetFramework.getDataset(datasetInstanceId, arguments, classLoader);
  } else {
    return remoteDatasetFramework.getDataset(datasetInstanceId, arguments, classLoader);
  }
}

 

 

Adapting to Cluster 

Having a LocalDatasetFramework for system namespace would make it useful for adapting to cluster, where the container's local directory will be used to store the system datasets and we can use the RemoteDatasetFramework of CDAP master for datasets in other namespace. 

 

 

 

  • No labels