Table of Contents |
---|
Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Goals
- Make CDAP authorization policy consistent across all entity types.
- Allow admins to set granular privileges on entities.
- Ranger integration for CDAP authorization
- Improve Sentry data model to fix existing issues seen on customer environments
- Allow admins to use existing role/groups for authorization
User Stories
Scenario 1
SetupOverview
- Privileges are managed at the entity level
- App level impersonation
- Dataset is owned by the application owner
- Cross namespace dataset access allowed
UsecaseDetails
admin1
creates a namespace CDAP namespaceetl
with principal principaletl-owner
admin2
deploys an app appfeed1
with principal principalfeed1-owner
in namespace namespaceetl
- During app app
feed1
configure, dataset datasetgold
is created with owner principal principalfeed1-owner
- operator1
ops1
starts workflow in app feed1, that runs as principal feed1-owner - During the workflow run, principal
feed1-owner
reads/writes to datasetgold
- operator1
ops1
can list logs and metrics for workflow in appfeed1
- operator2
ops2
can list all apps/programs in namespaceetl
and view all their logs and metrics - operator2
ops2
can list all the datasets in namespaceetl
and view its properties ops2
cannot read any datasets in namespaceetl
Scenario 2
SetupOverview
- Privileges are managed at the namespace level
- Namespace level impersonation
- Dataset is owned by the namespace owner
- Cross namespace dataset access allowed
UsecaseDetails
admin1
creates a groupetl-group
in LDAPadmin1
creates creates namespaces in HDFS, HBase and Hive calledetl
admin1
grants grants all privileges on the above namespaces to groupetl-group
admin1
creates creates a CDAP namespace etl with principaletl-
ownerowner
using the namespaces from HDFS, HBase and Hive. Doesetl-owner
belong toetl-group
admin1
grants grants all privileges on the CDAP namespaceetl
to , and all entities under it to groupetl-group
etl-user1
belonging to groupetl-group
deploys appfeed1
in namespaceetl
- During app
feed1
configure, datasetgold
is created with owner principaletl-owner
etl-user2
belonging to groupelt-group
, starts workflow in appfeed1
, that runs as principaletl-owner
- During the workflow run, principal
feed1-owner
reads/writes to datasetgold
etl-user3
belonging to groupelt-
group cangroup
can list logs and metrics for workflow in appfeed1
analyst1
belonging to groupanalyst-group
is given privilege to read from read on namespaceetl
and all entities under it, using whichanalyst1
can read datasetgold
Scenario 3
SetupOverview
- Privileges are managed at the namespace level
- No impersonation
- All data is owned by CDAP
- All programs run as CDAP
- Cross namespace dataset access is allowed
UsecaseDetails
admin1
creates creates a groupetl-group
in LDAPadmin1
creates creates namespaces in HDFS, HBase and Hive calledetl
admin1
grants grants all privileges to the above namespaces to principalcdap
admin1
creates creates a CDAP namespaceetl
using the namespaces from HDFS, HBase and Hive.admin1
grants grants all privileges on the CDAP namespaceetl,
and all entities under it to groupetl-group
etl-user1
belonging to groupetl-group
deploys appfeed1
in namespaceetl
- During app
feed1
configure, datasetgold
is created with owner principalcdap
etl-user2
belonging to groupelt-group
, starts workflow in appfeed1
, that runs as principalcdap
- During the workflow run, principal
cdap
reads/writes to datasetgold
etl-user3
belonging to groupelt-
group cangroup
can list logs and metrics for workflow in appfeed1
- analyst1
etl-user3
belonging to group analystelt-group
is given privilege to can also read from datasetgold
analyst1
belonging to groupanalyst-group
is given privilege to read from datasetgold
Design
CDAP Authorization Policy
Existing CDAP Authorization Policy
The existing CDAP Authorization policy has the following limitations:
Granular privileges
- Cannot grant a privilege to a user to read only one dataset or one stream in a namespace.
- Cannot grant a privilege to a user to deploy/create an application/artifact/dataset/stream without granting WRITE on the namespace.
- Cannot grant a privilege to a user to start/stop a program without granting READ on the namespace.
- Visibility
- User who has a privilege on a program cannot see the program in the UI or CLI if the user does not have any privilege on the namespace.
- Inconsistency
- To write to a dataset user needs to have WRITE privilege on the dataset, but to write to a stream user needs to have both WRITE on the the stream and READ on the namespace.
- ADMIN on an entity allows the user to delete the entity, whereas ADMIN on an entity does not allow user to create it.
- Dataset read needs namespace READ, but dataset write does not need namespace WRITE.
- Redundancy
- Dataset READ and stream READ are redundant because they need namespace READ permission to be useful, and once a user has namespace READ the user can read all datasets and streams in the namespace.
- List and View operations are equivalent but are listed separately in documentation.
Overview of the Proposed Authorization Policy
The proposed CDAP Authorization policy can be defined by the following three principles:
Access
:Access defines who can perform an action (READ, WRITE, EXECUTE, ADMIN) on an entity.
Access flows top-down i.e. if an user has READ on namespace it implies that the user has READ on all entities inside the namespace. is not enforced in a hierarchical manner in CDAP.
Privileges in the authorization provider can be set up in a hierarchical manner (for instance by using wildcard privileges - how will this work in Sentry).
Visibility
- Visibility defines whether an entity is visible to a user or not.
- If a user has any privilege on an entity, it is visible to the user.
- Visibility is hierarchical and flows bottom-up i.e. if a user has any privilege on a program then the user will be able to see the application that contains the program and namespace that contains the application.
Grant
- Grant is defined as action of giving a privilege on an entity to a user.
- To grant privileges on an entity, ADMIN on the entity is required.
- Grant flows top-down i.e. if a user has ADMIN on namespace then the user can grant privileges on all entities inside the namespace.
- None of READ, WRITE, EXECUTE, ADMIN defined in CDAP will allow granting of privileges.
- Only the administrator of the authorization provider can grant privileges to any entity. CDAP will not auto-grant privileges to creators.
Impersonation
- Impersonation is defined as the ability to -
- deploy applications whose programs will execute as another user.
- create a namespace/dataset/stream with a owner principal
- run explore query in an impersonated namespace
- alice needs ADMIN privilege on principal bob to deploy an application that can impersonate bob.
- All operation that happens on the application/program entities are authorized using principal alice
- All operations done by the running program/query are authorized as principal bob
- This includes running the configure method and creating datasets from the application.
- Impersonation is defined as the ability to -
Decouple entity existence from privilege
In addition, CDAP will now support creating privileges for entities that are yet to be created. This will allow admins to grant fine grained privileges on entities. For example, an admin can grant a user ADMIN on an application before the application is deployed. This will allow the user to deploy only this specific application without having any other access to the namespace.
Changes to the authorization matrix
Note: The authorization matrix below enumerates all hierarchical privileges for clarity.to the authorization matrix
Instance
ADMIN on an Instance allows user to create Namespaces in the instance. No other operations are defined as of now. Also Instance is not a part of privilege hierarchy.
*Note: The following tables show privilege required in expanded hierarchical form. The privilege marked in bold are the new one which will be added in 4.3
Namespaces
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Create | ADMIN (on the CDAP instance) | ADMIN (on the CDAP instance) | ADMIN (on the namespace) |
Update | ADMIN (on the namespace) | |
Delete | ADMIN (on the namespace) | |
List | Only returns those namespaces on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | Will be removed |
ADMIN on the namespace, and all entities in the namespace | ||
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | Any of READ, WRITE, EXECUTE, or ADMIN privilege on the namespace or any of its descendants. |
GrantGet Namespace Meta | ADMIN (Any privilege on the namespace )or any of its descendants. |
Artifacts
Operation | Privileges Required (Existing) | Privileges Required (Proposed) | |||
---|---|---|---|---|---|
AddWRITE (on the namespace) | WRITE (on the namespace | ) | ADMIN (on the namespace) | ADMIN (on the artifact being deployed)) | ADMIN | ||
Add a property | ADMIN (on namespace) | ADMIN (on artifact) | ADMIN | |||
Remove a property | ADMIN (on namespace) | ADMIN (on artifact) | ADMIN | |||
Use to deploy an app | ADMIN | READ | WRITE | EXECUTE | ||||
Delete | ADMIN (on namespace) | ADMIN (on artifact) | List | Only returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, orADMIN | Will be removed | |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on artifact) | GrantAny privilege on the artifact | |||
Get artifact info/summary/detail | ADMIN (on the namespace) | ADMIN (on the artifact)| READ | WRITE | EXECUTE |
Applications
Operation | Privileges Required (Existing) | Privileges Required (Proposed) | Add | WRITE (on the namespace) and READ (on the artifact if deployed from an artifact) | |
---|---|---|---|---|---|
) | Privileges Required (Proposed) | ||||
Add | WRITE (on the namespace) and READ (on the artifact | ): If application is beingif deployed from an | existing artifactartifact) | ADMIN *Also see artifact privileges and principal privileges | |
Delete | ADMIN (on the application) | ADMIN (on the namespace) | ListOnly returns those applications on which user has at least one of READ, WRITE, EXECUTE, or | ADMIN | Will be removed | |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on application) | Any of READ, WRITE, EXECUTE, or ADMIN privilege on the namespace or application or any of its descendants. | |||
GrantGet application detail | ADMIN (Any privilege on the namespace) | ADMIN (on the application)application or any of its descendants. |
Programs
Operation | Privileges Required (Existing) | Privileges Required (Proposed) | |||||
---|---|---|---|---|---|---|---|
Start, Stop, or Debug | (EXECUTE (on the program) | EXECUTE (on the application) | EXECUTE (on the namespace)) & READ (on the namespace)) | EXECUTE (on the | namespaceapplication) | EXECUTE (on the | applicationnamespace)) | |& | EXECUTEREAD (on the | programnamespace) | EXECUTE |
Set instances | ADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program) | ADMIN | |||||
Set runtime arguments | ADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program) | ADMIN | |||||
Retrieve runtime arguments | READ (on the namespace) | READ (on the application) | READ (on the program) | READ | EXECUTE | ADMIN | |||||
Retrieve status | Any of READ, WRITE, EXECUTE, or ADMIN | ||||||
View/List | Only returns those programs on which user has at least one Any of READ, WRITE, EXECUTE, or ADMIN | Will be removed | |||||
View | Any of READ, WRITE, EXECUTE, or ADMIN | ||||||
Grant | ADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program) | ||||||
Get program specification | READ | WRITE | EXECUTE | ADMIN | ||||||
Resume/Suspend schedule | EXECUTE |
Datasets
Operation | Privileges Required (Existing) | Privileges Required (Proposed) | |||
---|---|---|---|---|---|
Create | WRITE (on the namespace) | WRITE (on the namespace) | ADMIN (on the namespace) | ADMIN (on the dataset being created) | |||
Read | (READ (on the dataset) and READ (namespace)) | READ (on the namespace) | READ (on the namespacenamespace)) | READ (on the | the datasetnamespace) | READ | |
Retrieving properties | Not Documented | Any of READ, WRITE, ADMIN, or EXECUTE | |||
Write | WRITE (on the dataset) | WRITE (on the namespace) | WRITE | |||
Update | (ADMIN (on the dataset) and READ (on the namespace)) | (ADMIN (on the namespace) and READ (on the namespace)) | ADMIN (on the namespace) | ADMIN (on the dataset) | |||
Upgrade | ADMIN (on the dataset) | ADMIN (on the namespace) | ADMIN | |||
Truncate | ADMIN (on the dataset) | ADMIN (on the namespace) | ADMIN | |||
Drop | ADMIN (on the dataset) | ADMIN (on the namespace) | ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or | ADMIN | Will be removed | |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | Grant | |||
Get dataset meta | ADMIN (on the namespace) | ADMIN (on the Dataset)READ | WRITE | EXECUTE | ADMIN |
Dataset Modules
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Deploy | WRITE (on the namespace) | WRITE (on the namespace) | ADMIN (on the namespace) | ADMIN (on the module being deployed) |
Delete | ADMIN (on the dataset module) | ADMIN (on the namespace) | ADMIN |
Delete-all in the namespace | ADMIN (on the namespace) | |
List | Only returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | Will be removed |
View | ADMIN on all dataset modules in the namespace | |
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | Grant |
Get module meta | ADMIN (on the namespace) | ADMIN (on the Dataset module)READ | WRITE | EXECUTE | ADMIN |
Dataset Types
Operation | Privileges Required (Existing) | Privileges Required (Proposed) | ||||
---|---|---|---|---|---|---|
View/List | Only returns those artifacts on which user has at least one Any of READ, WRITE, EXECUTE, or ADMIN | Will be removed | View | Any of READ, WRITE, EXECUTE, or ADMIN | ||
Get dataset type meta | READ | WRITE | EXECUTE | ADMIN |
Secure Keys
Operation | Privileges Required (Existing) | Privileges Required (Proposed) | |||
---|---|---|---|---|---|
Create | WRITE (on the namespace) | WRITE (on the namespace) | ADMIN (on the namespace) | ADMIN (on the key being created) | |||
Delete | ADMIN (on the key) | ADMIN (on the namespace) | ||||
List | Only returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMIN | will be removed | |||
ViewADMIN (on the key) | ADMIN (on the namespace) | ADMIN | ||||
View/List | Any of READ, WRITE, EXECUTE, or ADMIN | ||||
Read | Not Documented | READ (on the namespace) | READ (on the key) | Grant | ADMIN (on the namespace) | ADMIN (on the key) |
Streams
Operation | Privileges Required (Existing) | Privileges Required (Proposed) | ||
---|---|---|---|---|
Create | WRITE (on the namespace) | WRITE (on the namespace) | ADMIN(on the namespace) | | ADMIN (on the stream being created)ADMIN | |
Retrieving events | READ (on the stream) & READ (on the namespace) | READ (on the stream) | READ (on namespace) | ||
Retrieving properties | Any of READ, WRITE, ADMIN, or EXECUTE | |||
Sending events to a stream (sync, async, or batch) | (WRITE (on the stream) and READ (on the namespace)) | WRITE (on namespace & READ (on the namespace)) | WRITE (on the stream) | WRITE (on namespace) | ||
Drop | ADMIN (on stream) | ADMIN (on namespace) | ADMIN | ||
Drop-all in the namespace | ADMIN (on the namespace) | ADMIN (on the stream) | ADMIN on all the streams in the namespace | ||
Update | ADMIN (on the namespace) | ADMIN (on the stream) | ADMIN | ||
Truncate | ADMIN (on the namespace) | ADMIN (on the stream) | ADMIN | ||
List | Only returns those artifacts on which user has at least one View/List | Any of READ, WRITE, EXECUTE, or ADMIN | will be removed | |
View | Any of READ, WRITE, EXECUTE, or ADMIN | |||
Grant | ADMIN (on namespace) | ADMIN (on stream) |
Get stream property | READ | WRITE | EXECUTE | ADMIN |
Principal
Operation | Privileges Required (Existing) | Privileges Required (Proposed) |
---|---|---|
Deploy an app to impersonate a principal | ADMIN | |
Create a namespace with owner prinicpal | ADMIN | |
Create a dataset with owner prinicpal | ADMIN | |
Create a stream with owner prinicpal | ADMIN |
Open Questions
- How does authorization on CDAP system actions (like increasing instances of metrics processor, etc) happen?
CDAP Sentry Extension Improvements
Existing Model
CDAP allows privileges to be defined using entities and users. Sentry only allows privileges to be defined using roles and groups. CDAP is not aware of roles and groups, hence every grant made on entity and user has to be translated into a grant on roles and group.
For this translation, CDAP does the following
- Creates a proxy role per user and entity. This can lead to e x u roles being created, where e is the number of entities and u is number of users.
- Expects every user (say alice) to belong to a unique group in Hadoop User/Group mapping (group alice). Today this group name is expected to be the same as the username, and privileges for a user will be granted to that group. However, a user belonging to a group named same as the username is not true in all environments. This makes the privileges granted to the user ineffective during enforcement, and the user will not be able to access entities using these privileges.
In addition, revoking all privileges on an entity is expensive since it involves listing of all privileges for all users. This is because Sentry does not have an API to list all privileges for an entity.
Proposed Model
Allow admins to use existing roles and groups in Sentry for authorization in CDAP. This means CDAP will not grant/revoke any privileges for entities. (note: this is a stretch goal for 4.3)
However in cases where an admin wants CDAP to grant privileges we propose the following model:
- Create a proxy role per user to which CDAP will grant privileges for all entities associated with the user. This limits the number of roles created by CDAP in Sentry to u, where u is the number of users.
- Create a proxy group per user to which CDAP will grant the privileges. This removes the restriction of expecting a group with the same name as the username to be present, and will work in all environments. The proxy group so created will not be added to Hadoop User/Group mapping, and will only be part of Sentry privileges.
Investigate the new Sentry API (listPrivilegsbyAuthorizable) to list all privileges for a given entity so that we can avoid listing all privileges for all users during an entity deletion.
Backwards Compatibility
The above changes will be backward compatible with existing privileges.
- Grant: All new grants will happen in the new format.
- Revoke: Revoke will happen in both old and new format.
- Enforce: Enforce will work with both old and new privileges.
- List: List will list both old and new privileges.
Reduce CDAP Startup Time Due to Authorization
Problem
We have observed that as the number of entities in CDAP grow, CDAP startup time increases due to authorization (more than 20 mins in some cases). During CDAP startup, CDAP revokes and grant privileges on all system entities. Revoking all privileges on an entity is expensive since it requires listing all privileges for all users.
Proposed Solution
- cdap's access to system entities will bypass authorization (https://issues.cask.co/browse/CDAP-11659)
- AuthorizationEnforcer will always return true if requesting user is cdap and namespace is system.
- Authorizer grant/revoke will be no-op for the above case
Note: The underlying systems are still required to have appropriate permissions for cdap.
Use Existing Roles/Groups for Authorization
Currently, CDAP always grants/revokes privileges on an entity creation/deletion. Although this is a convenient feature, it does not work well in enterprise environments. Many enterprises prefer to manage privileges in a centralized authorization provider (like Sentry or Ranger). This will allow them to use existing role/groups to manage the privileges across all systems.
- To support this we will introduce a property in cdap-site.xml which will specify whether CDAP should automatically grant privileges on entity creation. By default CDAP will continue granting privileges on entity creation to maintain backwards compatibility.
- If an admin disables this feature, CDAP will not grant/revoke privileges on an entity automatically. In this case the admin is responsible for creating the appropriate privileges.
- Not all authorization providers (like Sentry) have tools to manage privileges. CDAP will have to provide tools for admins to manage privileges using Sentry (stretch goal for 4.3, in 4.3 cdap-cli will be modified to allow creating privileges for non-existing entities - as what user will cdap-cli grant these privileges?).
CDAP Ranger Integration
Please see Ranger Integration Design Document
CLI Impact or Changes
- CLI will be modified to not check for entity existence while granting privileges.
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|