Goals
Key Management
- Secure impersonation
- Authorization of dataset and stream access
- Authorization for listing and viewing entities
- Ability to map a namespace to user-provided storage provider namespaces
- Cross-namespace dataset access
- Support long-running programs in secure (kerberos) mode
Checklist
- User stories documented (Rohit/Ali/Bhooshan)
- User stories reviewed (Nitin)
- Design documented (Rohit/Ali/Bhooshan)
- Design reviewed (Andreas)
- Feature merged (Rohit/Ali/Bhooshan)
- Examples and guides (Rohit)
- Integration tests (Ali)
- Documentation for feature (Bhooshan)
- Blog post
User Stories
- As a CDAP security admin, I want CDAP programs to be run as the user running the program, and not as the headless "cdap" user.
- As a CDAP/Hydrator security admin, I want all sensitive information like passwords not be stored in plaintext.
- As a CDAP security admin, I want all operations on datasets/streams to be governed by my configured authorization system.
- As a CDAP security admin, I want list operations for all CDAP entities to only return entities that the logged-in user is authorized to view.
- As a CDAP security admin, I want view operations for a CDAP entity to only succeed if the logged-in user is authorized to view that entity
- As a CDAP user, I would like to specify the namespace in an underlying storage provider (e.g. HBase namespace, Hive database) to use for a particular CDAP namespace.
- As a CDAP admin, I want to allow users to access a dataset from a program in a different namespace, as long as the said user is authorized to access that dataset.
- As a CDAP user, I want to be able to run long running Mapreduce, Spark or Hive programs on a secure (kerberos-enabled) cluster.
Scenarios
Scenario #1
Scenario #2
Scenario #3
Entities, Operations and Privileges
Entity | Operation | Required Privileges | Resultant Privileges |
---|---|---|---|
Namespace | create | ADMIN (Instance) | ADMIN (Namespace) |
update | ADMIN (Namespace) | ||
list | READ (Instance) | ||
get | READ (Namespace) | ||
delete | ADMIN (Namespace) | ||
set preference | WRITE (Namespace) | ||
get preference | READ (Namespace) | ||
search | READ (Namespace) | ||
Artifact | add | WRITE (Namespace) | ADMIN (Artifact) |
delete | ADMIN (Artifact) | ||
get | READ (Artifact) | ||
list | READ (Namespace) | ||
write property | ADMIN (Artifact) | ||
delete property | ADMIN (Artifact) | ||
get property | READ (Artifact) | ||
refresh | WRITE (Instance) | ||
write metadata | ADMIN (Artifact) | ||
read metadata | READ (Artifact) | ||
Application | deploy | WRITE (Namespace) | ADMIN (Application) |
get | READ (Application) | ||
list | READ (Namespace) | ||
update | ADMIN (Application) | ||
delete | ADMIN (Application) | ||
set preference | WRITE (Application) | ||
get preference | READ (Application) | ||
add metadata | ADMIN (Application) | ||
get metadata | READ (Application) | ||
Programs | start/stop/debug | EXECUTE (Program) | |
set instances | ADMIN (Program) | ||
list | READ (Namespace) | ||
set runtime args | EXECUTE (Program) | ||
get runtime args | READ (Program) | ||
get instances | READ (Program) | ||
set preference | ADMIN (Program) | ||
get preference | READ (Program) | ||
get status | READ (Program) | ||
get history | READ (Program) | ||
add metadata | ADMIN (Program) | ||
get metadata | READ (Program) | ||
emit logs | WRITE (Program) | ||
view logs | READ (Program) | ||
emit metrics | WRITE (Program) | ||
view metrics | READ (Program) | ||
Streams | create | WRITE (Namespace) | ADMIN (Stream) |
update properties | ADMIN (Stream) | ||
delete | ADMIN (Stream) | ||
truncate | ADMIN (Stream) | ||
enqueue asyncEnqueue batch | WRITE (Stream) | ||
get | READ (Stream) | ||
list | READ (Namespace) | ||
read events | READ (Stream) | ||
set preferences | ADMIN (Stream) | ||
get preferences | READ (Stream) | ||
add metadata | ADMIN (Stream) | ||
get metadata | READ (Stream) | ||
view lineage | READ (Stream) | ||
emit metrics | WRITE (Stream) | ||
view metrics | READ (Stream) | ||
Datasets | list | READ (Namespace) | |
get | READ (Dataset) | ||
create | WRITE (Namespace) | ADMIN (Dataset) | |
update | ADMIN (Dataset) | ||
drop | ADMIN (Dataset) | ||
executeAdmin (exists/truncate/upgrade) | ADMIN (Dataset) | ||
add metadata | ADMIN (Dataset) | ||
get metadata | READ (Dataset) | ||
view lineage | READ (Dataset) | ||
emit metrics | WRITE (Dataset) | ||
view metrics | READ (Dataset) |
NOTE: Cells marked green were done in 3.4
Design
Testing
Installation
Questions
Out-of-scope User Stories (3.5 and beyond)
- As a CDAP admin, I should be able to authorize reads/writes to datasets
- As a CDAP admin, I should be able to authorize metadata changes to CDAP entities
- As a CDAP system, I should be able to push down ACLs to storage providers
- As a CDAP admin, I should be able to authorize reads/writes to custom datasets
- As a CDAP system, I should be able to judge, document and improve the performance impact of authorization
- As a CDAP authorization system, I should be able to interact with an external authentication system
- As a CDAP admin, I should be able to use external UIs like Hue for ACL Management
- As a CDAP admin, I should be able to see an audit log of all authorization-related changes in CDAP
- As a CDAP admin, I should be able to authorize all thrift-based traffic, so transaction management is also authorized.