Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

What CDAP platform provides:

...

Preview in Distributed Mode:

  1. Preview service will run in a separate container. The container will be started when the master is started and will keep running.
  2. Data generated by the preview system will be stored locally to the container. We can use the leveldb database similar to the standalone mode.
  3. Instances of the container can be increased for scalability, however in that case since the preview data is local to the container, request for the preview data will need to be
    routed to the appropriate container which handled the preview request. One approach to achieve this is store the mappings in the HBase.
  4. PreviewHttpHandler will be exposed through the preview container.
  5. Logging for preview - Preview container will use the local log appender similar to the SDK. Do we need the ability to change the log levels for preview. For example should we allow running the application in preview mode using trace log level and running the application in normal mode using info level.
  6. MetricsContext for preview - Querying metrics at the namespace level may not yield the entire namespace level data if the multiple instances of the preview containers are running.
  7. Authorization: We store user privileges in Sentry. User is allowed to execute the program if he has EXECUTE permissions on it. This is currently managed by AuthorizationEnforcer. We can inject same instance in the Preview container so that reading and writing to the user datasets will be controlled by privileges in the sentry.
  8. Impersonation: We store impersonation configurations in the Namespace meta store. NamespaceQueryAdmin is responsible for reading those configs. Preview container will need access to the instance of the query admin which will query the actual HBase table.
  9. Deletion of the preview data: We will need the service which will clean up the preview data periodically. 

Open Questions:

  1. Since we have separate instances of system tables for the preview, how would we know when the new namespace is created by the user? Should we share the NamespaceQueryAdmin?
  2. Tracker with the actual datasets used in preview