Goals
- Performance improvements (caching authorization policies)
Authorization of dataset and stream access
- Authorization for listing and viewing entities
Checklist
- User stories documented (Bhooshan)
- User stories reviewed (Nitin)
- Design documented (Bhooshan)
- Design reviewed (Andreas/Terence)
- Feature merged (Bhooshan)
- Blog post
User Stories
- As a CDAP security admin, I want all operations on datasets/streams to be governed by my configured authorization system.
- As a CDAP security admin, I want list operations for all CDAP entities to only return entities that the logged-in user is authorized to view.
- As a CDAP security admin, I want view operations for a CDAP entity to only succeed if the logged-in user is authorized to view that entity
Scenarios
Scenario #1
Derek is an IT Operations Extraordinaire at a corporation that uses CDAP to manage datasets with varying degrees of sensitivity. He would like to implement authorization policies for all data stored in CDAP across datasets and streams, so only authorized users have access to such data. He would like to control both read as well as write access.
Scenario #2
Derek would like to be able to use external authorization systems like Apache Sentry to manage authorization policies. Given that Apache Sentry could be installed in a different environment from CDAP, he would like to minimize the impact of verifying authorization while accessing data. Derek expects that performance improvement does not result in security breaches. For example, if authorization policies are cached in CDAP, Derek expects that they be refreshed regularly at configurable time intervals.
Scenario #3
In the said organization, CDAP is used to store data belonging to various business units. These business units are potentially completely disparate, and do not share information. Some of their data or applications may be extremely sensitive. As a security measure, Derek would also like to enforce authorization for operations that list CDAP entities, so that a user can only see the entities that he is authorized to read or write.
Entities, Operations and Privileges
Entity | Operation | Required Privileges | Resultant Privileges |
---|---|---|---|
Namespace | create | ADMIN (Instance) | ADMIN (Namespace) |
update | ADMIN (Namespace) | ||
list | READ (Instance) | ||
get | READ (Namespace) | ||
delete | ADMIN (Namespace) | ||
set preference | WRITE (Namespace) | ||
get preference | READ (Namespace) | ||
search | READ (Namespace) | ||
Artifact | add | WRITE (Namespace) | ADMIN (Artifact) |
delete | ADMIN (Artifact) | ||
get | READ (Artifact) | ||
list | READ (Namespace) | ||
write property | ADMIN (Artifact) | ||
delete property | ADMIN (Artifact) | ||
get property | READ (Artifact) | ||
refresh | WRITE (Instance) | ||
write metadata | ADMIN (Artifact) | ||
read metadata | READ (Artifact) | ||
Application | deploy | WRITE (Namespace) | ADMIN (Application) |
get | READ (Application) | ||
list | READ (Namespace) | ||
update | ADMIN (Application) | ||
delete | ADMIN (Application) | ||
set preference | WRITE (Application) | ||
get preference | READ (Application) | ||
add metadata | ADMIN (Application) | ||
get metadata | READ (Application) | ||
Programs | start/stop/debug | EXECUTE (Program) | |
set instances | ADMIN (Program) | ||
list | READ (Namespace) | ||
set runtime args | EXECUTE (Program) | ||
get runtime args | READ (Program) | ||
get instances | READ (Program) | ||
set preference | ADMIN (Program) | ||
get preference | READ (Program) | ||
get status | READ (Program) | ||
get history | READ (Program) | ||
add metadata | ADMIN (Program) | ||
get metadata | READ (Program) | ||
emit logs | WRITE (Program) | ||
view logs | READ (Program) | ||
emit metrics | WRITE (Program) | ||
view metrics | READ (Program) | ||
Streams | create | WRITE (Namespace) | ADMIN (Stream) |
update properties | ADMIN (Stream) | ||
delete | ADMIN (Stream) | ||
truncate | ADMIN (Stream) | ||
enqueue asyncEnqueue batch | WRITE (Stream) | ||
get | READ (Stream) | ||
list | READ (Namespace) | ||
read events | READ (Stream) | ||
set preferences | ADMIN (Stream) | ||
get preferences | READ (Stream) | ||
add metadata | ADMIN (Stream) | ||
get metadata | READ (Stream) | ||
view lineage | READ (Stream) | ||
emit metrics | WRITE (Stream) | ||
view metrics | READ (Stream) | ||
Datasets | list | READ (Namespace) | |
get | READ (Dataset) | ||
create | WRITE (Namespace) | ADMIN (Dataset) | |
update | ADMIN (Dataset) | ||
drop | ADMIN (Dataset) | ||
executeAdmin (exists/truncate/upgrade) | ADMIN (Dataset) | ||
add metadata | ADMIN (Dataset) | ||
get metadata | READ (Dataset) | ||
view lineage | READ (Dataset) | ||
emit metrics | WRITE (Dataset) | ||
view metrics | READ (Dataset) |
NOTE: Cells marked green were done in 3.4
Design
Testing
Installation
Questions
Out-of-scope User Stories (4.0 and beyond)
- As a CDAP admin, I should be able to authorize metadata changes to CDAP entities
- As a CDAP system, I should be able to push down ACLs to storage providers
- As a CDAP admin, I should be able to see an audit log of all authorization-related changes in CDAP
- As a CDAP admin, I should be able to authorize all thrift-based traffic, so transaction management is also authorized.