...
In the current dataset framework (< 3.5.0), Authorization, Lineage and Usage (ALU) are only enforced/tracked at the dataset instance level, which is an all or nothing approach. This restricts the usefulness capabilities of Datasets in secure environment . Also, and CDAP is not able to capture complete lineage/usage information. Starting from 3.5.0, we would like to support per data operation ALU.
We will introduce new annotations in cdap-api
for custom datasets to annotate constructors/methods:
...
A new internal class, DatasetRuntimeContext
, will be introduced for recording dataset call stack in order to perform ALU operations. That class need needs to be inside in the cdap-api
module, since it needs to be callable from any dataset, including custom dataset. The class looks like this:
...
The DatasetFramework
will set the context before calling the DatasetDefinition.getDataset
method so that in the Dataset
can get . In the Dataset constructor, it is expected that it will get hold of the DatasetRuntimeContext
instance and be able to and store it in a field and use it on each method. This is the pattern on how to use the context:
...
Since the call to the onMethodEntry
and onMethodExit
is required for every methods on a dataset class, it is unrealistic to require dataset developer to do that by themselves (error-prone and untrusted). Since every custom dataset is loaded through a different custom classloader (ProgramClassLoader
), we can rewrite the bytecode during classloading to insert calls to those two methods.
Call Stack
A method can have multiple caller different entry points.:
- Non-private constructor called from
DatasetDefinition
. - Non-private method called from program.
- Non-private method called from another
Dataset
(embeddedDataset
case). - Constructor/method called from another constructor/method from the same (sub)class.
...
- The method is the first entry point for dataset operation (point 1 and 2 above).
- ALU operations are pretty straightforward. It will just based base on the annotation. E.g. if annotated with
@ReadOnly
, then consult theAuthorizationEnforcer
for theREAD
action on the current dataset from the currentPrincipal
.- If a constructor is not annotated, default to
@NoAccess
- If a method is not annotated, default to
@ReadWrite
- If a constructor is not annotated, default to
- ALU operations are pretty straightforward. It will just based base on the annotation. E.g. if annotated with
- The method is not the first entry point and there is already and an operation scope defined (point 3 and 4 above).
- Through the
DatasetRuntimeContext
, the call stack is tracked - If the current method annotation is the same or is a proper subset of the immediate parent in the call stack, no ALU operations needed.
@NoAccess
is a proper subset of all others@ReadOnly
is a proper subset of@ReadWrite
@WriteOnly
is a proper subset of@ReadWrite
@ReadWrite
is not a proper subset of any.
- If the current method annotation is not a subset of the immediate parent, the
onMethodEnter
call will fail with an exception - For unannotated constructor/method, it will default to the same annotation as the immediate parent.
- Through the
...