...
In the said organization, CDAP is used to store data belonging to various business units. These business units are potentially completely disparate, and do not share information. Some of their data or applications may be extremely sensitive. As a security measure, Derek would also like to enforce authorization for operations that list CDAP entities, so that a user can only see the entities that he is authorized to read or write.
Design
The most critical requirement to address in 3.5 is to authorize dataset and stream operations. These operations can be categorized into data access (read/write) and management (create, update, delete). Management operations can be presumed to occur less often than data access operations, and are not in the data path. As a result, even though performance is important, it is less critical for management operations compared to data access operations. For data access operations, it is not practical to communicate with an external authorization system like Apache Sentry for every single operation, since that would lead to major performance degradation. As a result, authorization policies need to be cached in CDAP potentially for all operations, but especially for data access operations.
One of the major concerns about caching is freshness or invalidation. It is especially important in a security/authorization context, because it could result in security breaches. For example, suppose we've cached all authorization policies. An update, especially a rollback of privileges in the external authorization system should result in an immediate refresh of the cache, otherwise there could be security breaches by the time refresh takes place.
Entities, Operations and Privileges
...
NOTE: Cells marked green were done in 3.4. Cells marked in yellow are in scope for 3.5.
...
Testing
Installation
Questions
...