Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the current dataset framework (< 3.5.0), Authorization, Lineage and Usage (ALU) are only enforced/tracked at the dataset instance level, which is an all or nothing approach. This restricts the usefulness capabilities of Datasets in secure environment . Also, and CDAP is not able to capture complete lineage/usage information. Starting from 3.5.0, we would like to support per data operation ALU.

We will introduce new annotations in cdap-api for custom datasets to annotate constructors/methods:

...

A new internal class, DatasetRuntimeContext, will be introduced for recording dataset call stack in order to perform ALU operations. That class need needs to be inside in the cdap-api module, since it needs to be callable from any dataset, including custom dataset. The class looks like this:

...

The DatasetFramework will set the context before calling the DatasetDefinition.getDataset method so that in the Dataset can get . In the Dataset constructor, it is expected that it will get hold of the DatasetRuntimeContext instance and be able to and store it in a field and use it on each method. This is the pattern on how to use the context:

...

Since the call to the onMethodEntry and onMethodExit is required for every methods on a dataset class, it is unrealistic to require dataset developer to do that by themselves (error-prone and untrusted). Since every custom dataset is loaded through a different custom classloader (ProgramClassLoader), we can rewrite the bytecode during classloading to insert calls to those two methods.

Call Stack

A method can have multiple caller different entry points.:

  1. Non-private constructor called from DatasetDefinition.
  2. Non-private method called from program.
  3. Non-private method called from another Dataset (embedded Dataset case).
  4. Constructor/method called from another constructor/method from the same (sub)class.

...

  1. The method is the first entry point for dataset operation (point 1 and 2 above).
    • ALU operations are pretty straightforward. It will just based base on the annotation. E.g. if annotated with @ReadOnly, then consult the AuthorizationEnforcer for the READ action on the current dataset from the current Principal.
      • If a constructor is not annotated, default to @NoAccess
      • If a method is not annotated, default to @ReadWrite

  2. The method is not the first entry point and there is already and an operation scope defined (point 3 and 4 above).
    • Through the DatasetRuntimeContext, the call stack is tracked
    • If the current method annotation is the same or is a proper subset of the immediate parent in the call stack, no ALU operations needed.
      • @NoAccess is a proper subset of all others
      • @ReadOnly is a proper subset of @ReadWrite
      • @WriteOnly is a proper subset of @ReadWrite
      • @ReadWrite is not a proper subset of any.
    • If the current method annotation is not a subset of the immediate parent, the onMethodEnter call will fail with an exception
    • For unannotated constructor/method, it will default to the same annotation as the immediate parent.

...