Table of Contents |
---|
Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Goals
- Make CDAP authorization policy consistent across all entity types.
- Allow admins to set granular privileges on entities.
- Ranger integration for CDAP authorization
- Improve Sentry data model to fix existing issues seen on customer environments
- Allow admins to use existing role/groups for authorization
User Stories
- TBD
Design
CDAP Authorization
ModelPolicy
Existing CDAP Authorization
ModelPolicy
The existing CDAP Authorization Model policy has the following limitations:
Granular privileges
- Cannot grant a privilege to a user to read only one dataset or one stream in a namespace.
- Cannot grant a privilege to a user to deploy/create an application/artifact/dataset/stream without granting WRITE on the namespace.
- Cannot grant a privilege to a user to start/stop a program without granting READ on the namespace.
- Visibility
- User who has a privilege on a program cannot see the program in the UI or CLI if the user does not have any privilege on the namespace.
- Inconsistency
- To write to a dataset user needs to have WRITE privilege on the dataset, but to write to a stream user needs to have both WRITE on the the stream and READ on the namespace.
- To retrieve the dataset properties READ on dataset is required, whereas to read stream properties any privilege (READ/WRITE/EXECUTE/ADMIN) is sufficient.
- ADMIN on an entity allows the user to delete the entity, whereas ADMIN on an entity does not allow user to create it.
- Dataset read needs namespace READ, but dataset write does not need namespace WRITE.
- Redundancy
- Dataset READ and stream READ are redundant because they need namespace READ permission to be useful, and once a user has namespace READ the user can read all datasets and streams in the namespace.
- List and View operations are equivalent but are listed separately in documentation.
Overview of the Proposed
ModelWe propose the followingAuthorization Policy
The proposed CDAP Authorization policy which can be defined by the following three principles:
Access:
Access defines who can perform an action (READ, WRITE, EXECUTE, ADMIN) on an entity.
Access flows top-down i.e. if an user has READ on namespace it implies that the user has READ on all entities inside the namespace.
Visibility
- Visibility defines whether an entity is visible to a user or not.
- If a user has any privilege on an entity, it is visible to the user.
- Visibility flows bottom-up i.e. if a user has any privilege on a program then the user will be able to see the application that contains the program and namespace that contains the application.
Grant
- Grant is defined as action of giving a privilege on an entity to a user.
- To grant privileges on an entity ADMIN on the entity is required.
- Grant flows top-down i.e. if a user has ADMIN on namespace then the user can grant privileges on all entities inside the namespace.
Note: CDAP Instance is not part of the privilege hierarchy.
Decouple entity existence from privilege:
In addition, CDAP will now support creating privileges for entities that are yet to be created. This will allow admins to grant fine grained privileges on entities. For example, an admin can grant a user ADMIN on an application before the application is deployed. This will allow the user to deploy only this specific application without having any other access to the namespace.
Based on the above, we propose the following changesChanges to the authorization matrix
:Instance
ADMIN on an Instance allows user to create Namespaces in the instance. No other operations are defined as of now. Also Instance is not a part of the privilege hierarchy.
Namespaces
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Create | WRITE (on the CDAP instance) | WRITE (on the CDAP instance) | ADMIN (on the namespace) |
Update | ADMIN (on the namespace) | |
Delete | ADMIN (on the namespace) | |
List | Only returns those namespaces on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | |
View | At least one of READ, WRITE, EXECUTE, or ADMIN | At least one of READ, WRITE, EXECUTE, or ADMIN on the namespace or any of its descendants. |
Grant | ADMIN (on the namespace) |
Artifacts
Operation | Privileges Required | Privileges Required (Proposed) |
---|---|---|
Add | W (on the namespace) | WRITE (on the namespace) | ADMIN (on the namespace) | ADMIN (on the artifact being deployed) |
Add a property | A (on namespace) | A (on artifact) | |
Remove a property | A (on namespace) | A (on artifact) | |
Delete | A (on namespace) | A (on artifact) | |
List | Only returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | Will be removed |
View | Any of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on artifact) |
Applications
Operation | Privileges Required | Privileges Required (Proposed) |
---|---|---|
Add | WRITE (on the namespace) and READ (on the artifact if deployed from an artifact) | WRITE (on the namespace) | A (namespace) | A (app)
|
Delete | ADMIN | |
List | Only returns those applications on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | Will be removed |
View | Any of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on application) |
Programs
Operation | Privileges Required | Privileges Required (Proposed) |
---|---|---|
Start, Stop, or Debug | (EXECUTE (on the program) | EXECUTE (on the application) | EXECUTE (on the namespace)) & READ (on the namespace) | EXECUTE (on the program) | | EXECUTE (on the application) | EXECUTE (on the namespace) |
Set instances | ADMIN | |
Set runtime arguments | ADMIN | |
Retrieve runtime arguments | READ | |
Retrieve status | At least one of READ, WRITE, EXECUTE, or ADMIN | |
List | Only returns those programs on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | Will be removed |
View | At least one of READ, WRITE, EXECUTE, or ADMIN |
Datasets
Operation | Privileges Required | Privileges Required (Proposed) |
---|---|---|
Create | WRITE (on the namespace) | WRITE (on the namespace) | ADMIN (on the dataset being created) | ADMIN (on the namespace) |
Read | (READ (on the dataset) and READ (namespace)) | READ (on the namespace) | READ (on the namespace) | READ (on the the dataset) | |
Retrieving properties | Not Documented | At least one of READ, WRITE, ADMIN, or EXECUTE |
Write | WRITE (on the dataset) | WRITE (on the namespace) | WRITE (on the the namespace) | WRITE (on the the dataset) | |
Update | (ADMIN (on the dataset) and READ (on the namespace)) | (ADMIN (on the namespace) and READ (on the namespace)) | ADMIN (on the dataset) | ADMIN (on the namespace) |
Upgrade | ADMIN | |
Truncate | ADMIN | |
Drop | ADMIN | |
List | Only returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | Will be removed |
View | At least one of READ, WRITE, EXECUTE, or ADMIN |
Dataset Modules
Operation | Privileges Required | Privileges Required (Proposed) |
---|---|---|
Deploy | WRITE (on the namespace) | WRITE (on the namespace) | ADMIN (on the module being deployed) | ADMIN (on the namespace) |
Delete | ADMIN (on the dataset module) | ADMIN (on the namespace) | |
Delete-all in the namespace | ADMIN (on the namespace) | |
List | Only returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | Will be removed |
View | At least one of READ, WRITE, EXECUTE, or ADMIN |
Dataset Types
Operation | Privileges Required | Privileges Required (Proposed) |
---|---|---|
List | Only returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | Will be removed |
View | At least one of READ, WRITE, EXECUTE, or ADMIN |
Secure Keys
Operation | Privileges Required | Privileges Required (Proposed) |
---|---|---|
Create | WRITE (on the namespace) | WRITE (on the namespace) | ADMIN (on the key being created) | ADMIN (on the namespace) |
Delete | ADMIN | |
List | Only returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | |
View | At least one of READ, WRITE, EXECUTE, or ADMIN | |
Read | READ (on the namespace) | READ (on the key) |
Streams
Operation | Privileges Required | Privileges Required (Proposed) |
---|---|---|
Create | WRITE (on the namespace) | WRITE (on the namespace) | ADMIN (on the stream being created) | ADMIN (on the namespace) |
Retrieving events | READ (on the stream) & READ (on the namespace) | READ (on the stream) | READ (on namespace) |
Retrieving properties | At least one of READ, WRITE, ADMIN, or EXECUTE | |
Sending events to a stream (sync, async, or batch) | (WRITE (on the stream) and READ (on the namespace)) | WRITE (on namespace & READ (on the namespace)) | WRITE (on the stream) | WRITE (on namespace) |
Drop | ADMIN (on dataset) | ADMIN (on namespace) | |
Drop-all in the namespace | ADMIN (on the namespace) | |
Update | ADMIN | | |
Truncate | ADMIN | |
List | Only returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | |
View | At least one of READ, WRITE, EXECUTE, or ADMIN |
CDAP Sentry Extension Improvements
Existing Model
CDAP allows privileges to be defined using entities and users. Sentry is a RBAC which only allows privileges to be defined using roles and groups. CDAP is not aware of roles and groups hence every grant made on entity and user has to be translated into a grant on roles and group.
For this translation, CDAP does the following
- Creates a proxy role per user and entity. This can lead to e x u roles being created where e is the number of entities and u is number of users.
- Expects every user to belong to a unique group in Hadoop User/Group mapping. Today this group name is expected to be the same as the username and privileges for a user will be granted to the expected group name. However, a user belonging to a group named same as the username is not true in all environment. This leads to ineffective privileges being granted and user will not be able to access any entity using this privilege.
Revoking all privileges on an entity is expensive since it needs listing of all privileges for all users. This is because sentry does not have a way to list all privileges for an entity.
Proposed Model
Allow admins to use existing roles and groups in Sentry for authorization in CDAP. This means CDAP will not grant/revoke any privileges for entities. (note: this is a stretch goal for 4.3)
However in cases where admin wants CDAP to grant privileges we propose the following model:
- Create a proxy role per user to which we will grant privileges for all entities. This limits the number of roles created by CDAP in Sentry to u where u is the number of users.
- Create a proxy group per user to which we will grant the privileges. This removes the restriction of expecting a group with the same name as the username to be present and will work in all environments.
Investigate the new API (listPrivilegsbyAuthorizable()) to list all privileges for a given entity so that we can avoid listing all privileges for all users during entity deletion.
Backward Compatibility
The above changes will be backward compatible with existing privileges.
- Grant: All new grants will happen in the new format.
- Revoke: Revoke will happen in both old and new format.
- Enforce: Enforce will work with both old and new privileges.
- List: List will list both old and new privileges.
Reduce CDAP Startup Time Due to Authorization
Problem
We have observed that as the number of entities in CDAP grows CDAP startup time increases due to authorization (more than 20 mins in some case). This happens because every time when CDAP starts for all system entities CDAP revokes and grant privileges all over again. Revoking all privileges on an entity is expensive since it requires listing all privileges for all users.
Proposed Solution
- CDAP system service access to system entities will bypass authorization. (https://issues.cask.co/browse/CDAP-11659)
- AuthorizationEnforcer will always return true if requesting user is cdap and namespace is system.
- Authorizer grant/revoke will be no-op for the above case
Note: The underlying system will require to have appropriate permission for cdap.
Existing Roles/Groups for Authorization
Currently, CDAP always grants privileges on entity creation. Although this is a convenient feature it does not work well in enterprise environment. Lot of enterprises prefer to manage privileges in a centralized authorization provider (like sentry, ranger). This allow them to use existing role/groups to manage the privileges.
- To support this we will introduce a property in cdap-site.xml which will specify whether CDAP should grant privileges on entity creation. By default CDAP will continue granting privileges on entity creation to maintain backward compatibility.
- If an admin enables this feature CDAP will not grant/revoke privileges on an entity automatically. In this case the admin is responsible for creating the appropriate privileges.
- Not all authorization providers (like sentry) have tools to manage privileges. CDAP will have to provide tools for admins to manage privileges using sentry. (Stretch goal, in 4.3 cdap-cli will be modified to allow creating privileges for non-existing entities)
CDAP Ranger Integration
TBD (We will add Ranger Integration design link soon).
API changes
New Programmatic APIs
New Java APIs introduced (both user facing and internal)
Deprecated Programmatic APIs
New REST APIs
Path | Method | Description | Response Code | Response |
---|---|---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application | 200 - On success 404 - When application is not available 500 - Any internal errors |
|
Deprecated REST API
Path | Method | Description |
---|---|---|
/v3/apps/<app-id> | GET | Returns the application spec for a given application |
CLI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
UI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
Security Impact
What's the impact on Authorization and how does the design take care of this aspect
Impact on Infrastructure Outages
System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
Releases
Release X.Y.Z
Release X.Y.Z
Related Work
- Work #1
- Work #2
- Work #3