Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Checklist

  •  User Stories Documented
  •  User Stories Reviewed
  •  Design Reviewed
  •  APIs reviewed
  •  Release priorities assigned
  •  Test cases reviewed
  •  Blog post

Goals

  • Make CDAP authorization policy consistent across all entities and permissionsentity types.
  • Allow setting granular permissions at on a dataset level, application level , etc. 
  • Ranger integration for CDAP authorization
  • Improve Sentry data model to fix existing issues seen on customer environmentenvironments
  • Allow admins to use existing role/groups for authorization
 

User Stories 

  • TBD

Design

CDAP Authorization Model

Existing CDAP Authorization Model

The existing CDAP Authorization Model has the following drawbacks:

  • Granular permissions

    • Cannot grant a privilege to a user to read only one dataset or one stream in a namespace.
    • Cannot grant a privilege to a user to deploy an application/artifact/dataset/stream without granting write on the namespace.
    • Cannot grant a privilege to a user to start/stop a program without granting READ on the namespace.
  • Visibility
    • User who has a privilege on a program cannot see the program in the UI or CLI without having any privilege on the namespace. 
  • Inconsistencies 
    • To write to a dataset user needs to have WRITE privilege on the dataset but to write to a stream user needs to have WRITE on the the stream and READ on the namespace.
    • To retrieve dataset properties READ on dataset is required whereas to read stream properties any privilege (READ/WRITE/EXECUTE/ADMIN) is sufficient.
    • ADMIN on an entity allows to delete the entity where ADMIN on entity doesn't allow to CREATE.
    • Dataset read needs namespace READ but dataset write does not need namespace WRITE.
    • TBD Dataset Module Delete All.
  • Redundancy
    • List and View operations are equivalent but are listed separately in documentation.
    • Dataset READ and Stream READ are redundant because they need Namespace READ permission to be meaningful.

Overview of Proposed Model

We propose the following CDAP Authorization policy which can be defined by the following three principles:

  1. Access: 

    • Access defines who can perform an action (READ, WRITE, EXECUTE, ADMIN) on an entity. 

    • Access flows top-down i.e. if an user has READ on namespace it implies that the user has READ on all entities inside the namespace. 

  2. Visibility

    • Visibility defines whether an entity is visible to a user or not.
    • If a user has any privilege on an entity, it is visible to the user.
    • Visibility flows bottom-up i.e. if a user has any privilege on a program then the user will be able to see the application that contains the program and namespace that contains the application.
  3. Grant

    • Grant is defined as action of giving a privilege on an entity to a user.
    • To grant privileges on an entity ADMIN on the entity is required.
    • Grant flows top-down i.e. if a user has ADMIN on namespace then the user can grant privileges on all entities inside the namespace.
  • Note: CDAP Instance is not part of the privilege hierarchy.

Decouple entity existence from privilege:

In addition CDAP will now support creating privileges for entities that are yet to be created. This will allow admins to grant fine grained privileges on entities. For example, an admin can grant a user ADMIN on an application before the application is deployed. This will allow the user to deploy only this specific application without having any other access to the namespace.

Based on the above, we propose the following changes to the authorization matrix:

Instance

ADMIN on an Instance allows user to create Namespaces in the instance. No other operations are defined as of now.

Also Instance is not a part of privilege hierarchy.

Namespaces

OperationPrivileges Required (Existing)Privileges Required (Proposed)
CreateWRITE (on the CDAP instance)

WRITE (on the CDAP instance) | ADMIN (on the namespace)

UpdateADMIN (on the namespace) 
DeleteADMIN (on the namespace) 
ListOnly returns those namespaces on which user has at least one of READ, WRITE, EXECUTE, or ADMIN 
ViewAt least one of READ, WRITE, EXECUTE, or ADMINAt least one of READ, WRITE, EXECUTE, or ADMIN on the namespace or any of its descendants.
Grant ADMIN (on the namespace)

Artifacts

OperationPrivileges RequiredPrivileges Required (Proposed)
AddW (on the namespace)WRITE (on the namespace) | ADMIN (on the namespace) | ADMIN (on the artifact being deployed)
Add a propertyA (on namespace) | A (on artifact) 
Remove a propertyA (on namespace) | A (on artifact) 
DeleteA (on namespace) | A (on artifact) 
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAny of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on artifact) 

 

Applications

OperationPrivileges RequiredPrivileges Required (Proposed)
AddWRITE (on the namespace) and READ (on the artifact if deployed from an artifact)

WRITE (on the namespace) | A (namespace) | A (app)

  •  R (on namespace) | READ (on the artifact): If application is being deployed from an existing artifact
DeleteADMIN 
ListOnly returns those applications on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAny of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on application) 

 

Programs

OperationPrivileges RequiredPrivileges Required (Proposed)
Start, Stop, or Debug(EXECUTE (on the program) | EXECUTE (on the application) | EXECUTE (on the namespace)) READ (on the namespace)EXECUTE (on the program) |  | EXECUTE (on the application) | EXECUTE (on the namespace)
Set instancesADMIN 
Set runtime argumentsADMIN 
Retrieve runtime argumentsREAD 
Retrieve statusAt least one of READ, WRITE, EXECUTE, or ADMIN 
ListOnly returns those programs on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAt least one of READ, WRITE, EXECUTE, or ADMIN 

 

Datasets

OperationPrivileges RequiredPrivileges Required (Proposed)
CreateWRITE (on the namespace)WRITE (on the namespace) | ADMIN (on the dataset being created) | ADMIN (on the namespace)
Read(READ (on the dataset) and READ (namespace)) | READ (on the namespace)READ (on the namespace) | READ (on the the dataset) |
Retrieving propertiesNot DocumentedAt least one of READWRITEADMIN, or EXECUTE
WriteWRITE (on the dataset) | WRITE (on the namespace)WRITE (on the the namespace) | WRITE (on the the dataset) |
Update(ADMIN (on the dataset) and READ (on the namespace)) | (ADMIN (on the namespace) and READ (on the namespace))ADMIN (on the dataset) | ADMIN (on the namespace)
UpgradeADMIN 
TruncateADMIN 
DropADMIN 
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAt least one of READ, WRITE, EXECUTE, or ADMIN 

 

Dataset Modules

OperationPrivileges RequiredPrivileges Required (Proposed)
DeployWRITE (on the namespace)WRITE (on the namespace) | ADMIN (on the module being deployed) | ADMIN (on the namespace)
DeleteADMIN (on the dataset module) | ADMIN (on the namespace) 
Delete-all in the namespaceADMIN (on the namespace) 
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAt least one of READ, WRITE, EXECUTE, or ADMIN 

 

Dataset Types

OperationPrivileges RequiredPrivileges Required (Proposed)
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAt least one of READ, WRITE, EXECUTE, or ADMIN 

 

Secure Keys

OperationPrivileges RequiredPrivileges Required (Proposed)
CreateWRITE (on the namespace)WRITE (on the namespace) | ADMIN (on the key being created) | ADMIN (on the namespace)
DeleteADMIN 
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMIN 
ViewAt least one of READ, WRITE, EXECUTE, or ADMIN 
Read READ (on the namespace) | READ (on the key)

 

Streams

OperationPrivileges RequiredPrivileges Required (Proposed)
CreateWRITE (on the namespace)WRITE (on the namespace) | ADMIN (on the stream being created) | ADMIN (on the namespace)
Retrieving eventsREAD (on the stream) & READ (on the namespace)READ (on the stream) | READ (on namespace)
Retrieving propertiesAt least one of READWRITEADMIN, or EXECUTE 
Sending events to a stream (sync, async, or batch)(WRITE (on the stream) and READ (on the namespace)) | WRITE (on namespace & READ (on the namespace))WRITE (on the stream) | WRITE (on namespace)
DropADMIN (on dataset) | ADMIN (on namespace) 
Drop-all in the namespaceADMIN (on the namespace) 
UpdateADMIN |  
TruncateADMIN 
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMIN 
ViewAt least one of READ, WRITE, EXECUTE, or ADMIN 

 

 

CDAP Sentry Extension Improvements

Existing Model

CDAP allows privileges to be defined using entities and users. Sentry is a RBAC which only allows privileges to be defined using roles and groups. CDAP is not aware of roles and groups hence every grant made on entity and user has to be translated into a grant on roles and group.

For this translation, CDAP does the following

  1. Creates a proxy role per user and entity. This can lead to e x u roles being created where e is the number of entities and u is number of users.
  2. Expects every user to belong to a unique group in Hadoop User/Group mapping. Today this group name is expected to be the same as the username and privileges for a user will be granted to the expected group name. However, a user belonging to a group named same as the username is not true in all environment. This leads to ineffective privileges being granted and user will not be able to access any entity using this privilege.

Revoking all privileges on an entity is expensive since it needs listing of all privileges for all users. This is because sentry does not have a way to list all privileges for an entity. 

Proposed Model

Allow admins to use existing roles and groups in Sentry for authorization in CDAP. This means CDAP will not grant/revoke any privileges for entities. (note: this is a stretch goal for 4.3)

However in cases where admin wants CDAP to grant privileges we propose the following model:

  1. Create a proxy role per user to which we will grant privileges for all entities. This limits the  number of roles created by CDAP in Sentry to u where u is the number of users.
  2. Create a proxy group per user to which we will grant the privileges. This removes the restriction of expecting a group with the same name as the username to be present and will work in all environments.

Investigate the new API (listPrivilegsbyAuthorizable()) to list all privileges for a given entity so that we can avoid listing all privileges for all users during entity deletion.

Backward Compatibility

The above changes will be backward compatible with existing privileges.

  • Grant: All new grants will happen in the new format. 
  • Revoke: Revoke will happen in both old and new format.
  • Enforce: Enforce will work with both old and new privileges.
  • List: List will list both old and new privileges.

 

Reduce CDAP Startup Time Due to Authorization

Problem

We have observed that as the number of entities in CDAP grows CDAP startup time increases due to authorization (more than 20 mins in some case). This happens because every time when CDAP starts for all system entities CDAP revokes and grant privileges all over again. Revoking all privileges on an entity is expensive since it requires listing all privileges for all users.

Proposed Solution

  • CDAP system service access to system entities will bypass authorization. (https://issues.cask.co/browse/CDAP-11659)
  • AuthorizationEnforcer will always return true if requesting user is cdap and namespace is system.
  • Authorizer grant/revoke will be no-op for the above case

Note: The underlying system will require to have appropriate permission for cdap.

 

Existing Roles/Groups for Authorization

Currently, CDAP always grants privileges on entity creation. Although this is a convenient feature it does not work well in enterprise environment. Lot of enterprises prefer to manage privileges in a centralized authorization provider (like sentry, ranger). This allow them to use existing role/groups to manage the privileges.

  • To support this we will introduce a property in cdap-site.xml which will specify whether CDAP should grant privileges on entity creation. By default CDAP will continue granting privileges on entity creation to maintain backward compatibility.
  • If an admin enables this feature CDAP will not grant/revoke privileges on an entity automatically. In this case the admin is responsible for creating the appropriate privileges.
  • Not all authorization providers (like sentry) have tools to manage privileges. CDAP will have to provide tools for admins to manage privileges using sentry. (Stretch goal, in 4.3 cdap-cli will be modified to allow creating privileges for non-existing entities)

 

CDAP Ranger Integration

TBD (We will add Ranger Integration design link soon).

 

API changes

New Programmatic APIs

New Java APIs introduced (both user facing and internal)

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionResponse CodeResponse
/v3/apps/<app-id>GETReturns the application spec for a given application

200 - On success

404 - When application is not available

500 - Any internal errors

 

     

Deprecated REST API

PathMethodDescription
/v3/apps/<app-id>GETReturns the application spec for a given application

CLI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

UI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results
   
   
   
   

Releases

Release X.Y.Z

Release X.Y.Z

Related Work

  • Work #1
  • Work #2
  • Work #3

 

Future work