Page Comparison

...

In the current dataset framework (< 3.5.0), Authorization, Lineage and Usage (ALU) are only enforced/tracked at the dataset instance level, which is an all or nothing approach. This restricts the usefulness capabilities of Datasets in secure environment . Also, and CDAP is not able to capture complete lineage/usage information. Starting from 3.5.0, we would like to support per data operation ALU.

We will introduce new annotations in cdap-api for custom datasets to annotate constructors/methods:

...

A new internal class, DatasetRuntimeContext, will be introduced for recording dataset call stack in order to perform ALU operations. That class need needs to be inside in the cdap-api module, since it needs to be callable from any dataset, including custom dataset. The class looks like this:

...

The DatasetFramework will set the context before calling the DatasetDefinition.getDataset method so that in the Dataset can get . In the Dataset constructor, it is expected that it will get hold of the DatasetRuntimeContext instance and be able to and store it in a field and use it on each method. This is the pattern on how to use the context:

...

Since the call to the onMethodEntry and onMethodExit is required for every methods on a dataset class, it is unrealistic to require dataset developer to do that by themselves (error-prone and untrusted). Since every custom dataset is loaded through a different custom classloader (ProgramClassLoader), we can rewrite the bytecode during classloading to insert calls to those two methods.

Call Stack

A method can have multiple caller different entry points.:

Non-private constructor called from DatasetDefinition.
Non-private method called from program.
Non-private method called from another Dataset (embedded Dataset case).
Constructor/method called from another constructor/method from the same (sub)class.

...

The method is the first entry point for dataset operation (point 1 and 2 above).
- ALU operations are pretty straightforward. It will just based base on the annotation. E.g. if annotated with @ReadOnly, then consult the AuthorizationEnforcer for the READ action on the current dataset from the current Principal.
  - If a constructor is not annotated, default to @NoAccess
  - If a method is not annotated, default to @ReadWrite
The method is not the first entry point and there is already and an operation scope defined (point 3 and 4 above).
- Through the DatasetRuntimeContext, the call stack is tracked
- If the current method annotation is the same or is a proper subset of the immediate parent in the call stack, no ALU operations needed.
  - @NoAccess is a proper subset of all others
  - @ReadOnly is a proper subset of @ReadWrite
  - @WriteOnly is a proper subset of @ReadWrite
  - @ReadWrite is not a proper subset of any.
- If the current method annotation is not a subset of the immediate parent, the onMethodEnter call will fail with an exception
- For unannotated constructor/method, it will default to the same annotation as the immediate parent.

...

Versions Compared

Old Version 21

New Version 22

Key

Call Stack