Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

In the said organization, CDAP is used to store data belonging to various business units. These business units are potentially completely disparate, and do not share information. Some of their data or applications may be extremely sensitive. As a security measure, Derek would also like to enforce authorization for operations that list CDAP entities, so that a user can only see the entities that he is authorized to read or write.

Design

Authorizing Dataset and Stream operations

The most critical requirement to address in 3.5 is to authorize dataset and stream operations. These operations can be categorized into data access (read/write) and management admin (create, update, delete). Management Admin operations can be presumed to occur less often than data access operations, and are not in the data path. As a result, even though performance is important, it is less critical for management admin operations compared to data access operations. For data access operations, it is not practical to communicate with an external authorization system like Apache Sentry for every single operation, since that would lead to major performance degradation. As a result, authorization policies need to be cached in CDAP potentially for all operations, but especially for data access operations.

...

For such an authorization policy cache, the major design concerns goals are:

  1. Minimal refresh time 
    1. The refresh operation should be fast. 
    2. It should make minimal RPC calls
    3. It should transfer only the necessary data
  2. Configurable refresh interval
    1. The refresh operation should happen at configurable time intervals so users can tune it per their requirement.

To satisfy these goals, the data structure that should be cached can be defined as follows:

Code Block
 

 

Dependencies

Ability to distinguish between read and write operations in datasets

Entities, Operations and Privileges

...

NOTE: Cells marked green were done in 3.4. Cells marked in yellow are in scope for 3.5.

Testing

Installation

...

Out-of-scope User Stories (4.0 and beyond)

  1. As a CDAP admin, I should be able to authorize metadata changes to CDAP entities
  2. As a CDAP system, I should be able to push down ACLs to storage providers
  3. As a CDAP admin, I should be able to see an audit log of all authorization-related changes in CDAP
  4. As a CDAP admin, I should be able to authorize all thrift-based traffic, so transaction management is also authorized.

...