Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Checklist

  •  User Stories Documented
  •  User Stories Reviewed
  •  Design Reviewed
  •  APIs reviewed
  •  Release priorities assigned
  •  Test cases reviewed
  •  Blog post

Introduction 

 

Goals

  • Make CDAP authorization policy consistent across all entities and permissionsentity types.
  • Allow setting granular permissions at dataset level, application level etcadmins to set granular privileges on entities
  • Ranger integration for CDAP authorization
  • Improve Sentry data model to fix existing issues seen on customer environmentenvironments
  • Allow admins to use existing role/groups for authorization

 

User Stories 

TBD

Scenario 1

Design

CDAP Authorization Model

Existing CDAP Authorization Model

The existing CDAP Authorization Model has the following drawbacks:

  • Granular permissions

    • Cannot grant a privilege to a user to read only one dataset or one stream in a namespace.
    • Cannot grant a privilege to a user to deploy an application/artifact/dataset/stream without granting write on the namespace.
    • Cannot grant a privilege to a user to start/stop a program without granting READ on the namespace.
  • Visibility
    • User who has a privilege on a program cannot see the program in the UI or CLI without having any privilege on the namespace. 
  • Inconsistencies 
    • To write to a dataset user needs to have WRITE privilege on the dataset but to write to a stream user needs to have WRITE on the the stream and READ on the namespace.
    • To retrieve dataset properties READ on dataset is required whereas to read stream properties any privilege (READ/WRITE/EXECUTE/ADMIN) is sufficient.
    • ADMIN on an entity allows to delete the entity where ADMIN on entity doesn't allow to CREATE.
    • Dataset read needs namespace READ but dataset write does not need namespace WRITE.
    • TBD Dataset Module Delete All.
  • Redundancy
    • List and View operations are equivalent but are listed separately in documentation.
    • Dataset READ and Stream READ are redundant because they need Namespace READ permission to be meaningful.

Overview of Proposed Model

We propose the following CDAP Authorization policy which can be defined by the following three principles:

  1. Access: 

    • Access defines who can perform an action (READ, WRITE, EXECUTE, ADMIN) on an entity. 

    • Access flows top-down i.e. if an user has READ on namespace it implies that the user has READ on all entities inside the namespace. 

  2. Visibility

    • Visibility defines whether an entity is visible to a user or not.
    • If a user has any privilege on an entity, it is visible to the user.
    • Visibility flows bottom-up i.e. if a user has any privilege on a program then the user will be able to see the application that contains the program and namespace that contains the application.
  3. Grant

    • Grant is defined as action of giving a privilege on an entity to a user.
    • To grant privileges on an entity ADMIN on the entity is required.
    • Grant flows top-down i.e. if a user has ADMIN on namespace then the user can grant privileges on all entities inside the namespace.
  • Note: CDAP Instance is not part of the privilege hierarchy.

Decouple entity existence from privilege:

In addition CDAP will now support creating privileges for entities that are yet to be created. This will allow admins to grant fine grained privileges on entities. For example, an admin can grant a user ADMIN on an application before the application is deployed. This will allow the user to deploy only this specific application without having any other access to the namespace.

Based on the above, we propose the following changes to the authorization matrix:

Instance

ADMIN on an Instance allows user to create Namespaces in the instance. No other operations are defined as of now.

Also Instance is not a part of privilege hierarchy.

Namespaces

OperationPrivileges Required (Existing)Privileges Required (Proposed)CreateWRITE (on the CDAP instance)

WRITE (on the CDAP instance) | ADMIN (on the namespace)

UpdateADMIN DeleteADMIN ListOnly returns those namespaces on which user has at least one of READ, WRITE, EXECUTE, or ADMIN ViewAt least one of READ, WRITE, EXECUTE, or ADMINAt least one of READ, WRITE, EXECUTE, or ADMIN on the namespace or any of its descendants.Grant ADMIN (on the namespace)

Artifacts

OperationPrivileges RequiredPrivileges Required (Proposed)AddW (on the namespace)WRITE (on the namespace) | ADMIN (on the namespace) | ADMIN (on the artifact being deployed)Add a propertyA (on namespace) | A (on artifact) Remove a propertyA (on namespace) | A (on artifact) DeleteA (on namespace) | A (on artifact) ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removedViewAny of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on artifact) 

 

Applications

OperationPrivileges RequiredPrivileges Required (Proposed)AddWRITE (on the namespace) and READ (on the artifact if deployed from an artifact)

WRITE (on the namespace) | A (namespace) | A (app)

  •  R (on namespace) | READ (on the artifact): If application is being deployed from an existing artifact
DeleteADMIN ListOnly returns those applications on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removedViewAny of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on application) 

 

Programs

OperationPrivileges RequiredPrivileges Required (Proposed)Start, Stop, or Debug(EXECUTE (on the program) | EXECUTE (on the application) | EXECUTE (on the namespace)) READ (on the namespace)EXECUTE (on the program) |  | EXECUTE (on the application) | EXECUTE (on the namespace)Set instancesADMIN Set runtime argumentsADMIN Retrieve runtime argumentsREAD Retrieve statusAt least one

Overview

  • Privileges are managed at the entity level
  • App level impersonation
  • Dataset is owned by the application owner
  • Cross namespace dataset access allowed (question)

Details

  • admin1 creates a CDAP namespace etl with principal etl-owner
  • admin2 deploys an app feed1 with principal feed1-owner in namespace etl
  • During app feed1 configure, dataset gold is created with owner principal feed1-owner
  • ops1 starts workflow in app feed1, that runs as principal feed1-owner
  • During the workflow run, principal feed1-owner reads/writes to dataset gold
  • ops1 can list logs and metrics for workflow in app feed1
  • ops2 can list all apps/programs in namespace etl and view all their logs and metrics
  • ops2 can list all the datasets in namespace etl and view its properties (question)
  • ops2 cannot read any datasets in namespace etl

Scenario 2

Overview

  • Privileges are managed at the namespace level
  • Namespace level impersonation
  • Dataset is owned by the namespace owner
  • Cross namespace dataset access allowed

Details

  • admin1 creates a group etl-group in LDAP
  • admin1 creates namespaces in HDFS, HBase and Hive called etl
  • admin1 grants all privileges on the above namespaces to group etl-group
  • admin1 creates a CDAP namespace etl with principal etl-owner (question) using the namespaces from HDFS, HBase and Hive. Does etl-owner belong to etl-group(question)
  • admin1 grants all privileges on the CDAP namespace etl, and all entities under it to group etl-group
  • etl-user1 belonging to group etl-group deploys app feed1 in namespace etl
  • During app feed1 configure, dataset gold is created with owner principal etl-owner
  • etl-user2 belonging to group elt-group, starts workflow in app feed1, that runs as principal etl-owner
  • During the workflow run, principal feed1-owner reads/writes to dataset gold
  • etl-user3 belonging to group elt-group can list logs and metrics for workflow in app feed1
  • analyst1 belonging to group analyst-group is given privilege read on namespace etl and all entities under it, using which analyst1 can read dataset gold

Scenario 3

Overview

  • Privileges are managed at the namespace level
  • No impersonation
  • All data is owned by CDAP
  • All programs run as CDAP
  • Cross namespace dataset access is allowed

Details

  • admin1 creates a group etl-group in LDAP
  • admin1 creates namespaces in HDFS, HBase and Hive called etl
  • admin1 grants all privileges to the above namespaces to principal cdap
  • admin1 creates a CDAP namespace etl using the namespaces from HDFS, HBase and Hive.
  • admin1 grants all privileges on the CDAP namespace etl, and all entities under it to group etl-group
  • etl-user1 belonging to group etl-group deploys app feed1 in namespace etl
  • During app feed1 configure, dataset gold is created with owner principal cdap
  • etl-user2 belonging to group elt-group, starts workflow in app feed1, that runs as principal cdap
  • During the workflow run, principal cdap reads/writes to dataset gold
  • etl-user3 belonging to group elt-group can list logs and metrics for workflow in app feed1
  • etl-user3 belonging to group elt-group can also read from dataset gold
  • analyst1 belonging to group analyst-group is given privilege to read from dataset gold

Design

CDAP Authorization Policy

Existing CDAP Authorization Policy

The existing CDAP Authorization policy has the following limitations:

  • Granular privileges

    • Cannot grant a privilege to a user to read only one dataset or one stream in a namespace.
    • Cannot grant a privilege to a user to deploy/create an application/artifact/dataset/stream without granting WRITE on the namespace.
    • Cannot grant a privilege to a user to start/stop a program without granting READ on the namespace.
  • Visibility
    • User who has a privilege on a program cannot see the program in the UI or CLI if the user does not have any privilege on the namespace. 
  • Inconsistency
    • To write to a dataset user needs to have WRITE privilege on the dataset, but to write to a stream user needs to have both WRITE on the the stream and READ on the namespace.
    • ADMIN on an entity allows the user to delete the entity, whereas ADMIN on an entity does not allow user to create it.
    • Dataset read needs namespace READ, but dataset write does not need namespace WRITE.
  • Redundancy
    • Dataset READ and stream READ are redundant because they need namespace READ permission to be useful, and once a user has namespace READ the user can read all datasets and streams in the namespace.
    • List and View operations are equivalent but are listed separately in documentation.

Overview of the Proposed Authorization Policy

The proposed CDAP Authorization policy can be defined by the following principles:

  1. Access

    • Access defines who can perform an action (READ, WRITE, EXECUTE, ADMIN) on an entity. 

    • Access is not enforced in a hierarchical manner in CDAP.

    • Privileges in the authorization provider can be set up in a hierarchical manner (for instance by using wildcard privileges - how will this work in Sentry(question)).

  2. Visibility

    • Visibility defines whether an entity is visible to a user or not.
    • If a user has any privilege on an entity, it is visible to the user.
    • Visibility is hierarchical and flows bottom-up i.e. if a user has any privilege on a program then the user will be able to see the application that contains the program and namespace that contains the application.
  3. Grant

    • Grant is defined as action of giving a privilege on an entity to a user.
    • None of READ, WRITE, EXECUTE, ADMIN defined in CDAP will allow granting of privileges.
    • Only the administrator of the authorization provider can grant privileges to any entity. CDAP will not auto-grant privileges to creators.
  4. Impersonation

    • Impersonation is defined as the ability to -
      • deploy applications whose programs will execute as another user.
      • create a namespace/dataset/stream with a owner principal
      • run explore query in an impersonated namespace
    • alice needs ADMIN privilege on principal bob to deploy an application that can impersonate bob.
      • All operation that happens on the application/program entities are authorized using principal alice
      • All operations done by the running program/query are authorized as principal bob
        • This includes running the configure method and creating datasets from the application.

Decouple entity existence from privilege

In addition, CDAP will now support creating privileges for entities that are yet to be created. This will allow admins to grant fine grained privileges on entities. For example, an admin can grant a user ADMIN on an application before the application is deployed. This will allow the user to deploy only this specific application without having any other access to the namespace.

Changes to the authorization matrix

Instance

ADMIN on an Instance allows user to create Namespaces in the instance. No other operations are defined as of now. Also Instance is not a part of privilege hierarchy.

Note: The privilege marked in bold are the new one which will be added in 4.3

Namespaces

 

Datasets

OperationPrivileges Required
OperationPrivileges Required (Existing)Privileges Required (Proposed)
CreateADMIN (on the CDAP instance)

ADMIN

UpdateADMIN (on the namespace) 
DeleteADMIN (on the namespace)ADMIN on the namespace, and all entities in the namespace
View/ListAny of READ, WRITE, EXECUTE, or ADMIN
 ListOnly returns those programs on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removedViewAt least one of READ, WRITE, EXECUTE, or ADMIN 
Any privilege on the namespace or any of its descendants.
Get Namespace Meta Any privilege on the namespace or any of its descendants.


Artifacts

OperationPrivileges Required (Existing)Privileges Required (Proposed)
Create
AddWRITE (on the namespace)
WRITE 
ADMIN
Add a propertyADMIN (on
the
namespace) | ADMIN (on
the dataset being created) | ADMIN (on the namespace)Read(READ (on the dataset) and READ (namespace)) | READ (on the namespace)READ (on the namespace) | READ (on the the dataset) | Retrieving propertiesNot DocumentedAt least one of READWRITEADMIN, or EXECUTEWriteWRITE (on the dataset) | WRITE (on the namespace)WRITE (on the the namespace) | WRITE (on the the dataset) | Update(ADMIN (on the dataset) and READ (on the namespace)) | (ADMIN
artifact)ADMIN
Remove a propertyADMIN (on namespace) | ADMIN (on artifact)ADMIN
Use to deploy an app ADMIN | READ | WRITE | EXECUTE
DeleteADMIN (on namespace) | ADMIN (on artifact)ADMIN
View/ListAny of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on artifact)Any privilege on the artifact
Get artifact info/summary/detail ADMIN | READ | WRITE | EXECUTE

 

Applications

OperationPrivileges Required (Existing)Privileges Required (Proposed)
AddWRITE (on the namespace) and READ (on the
namespace))ADMIN (on the dataset
artifact if deployed from an artifact)

ADMIN

*Also see artifact privileges and principal privileges

DeleteADMIN (on the application) | ADMIN
 
(on the namespace)
Upgrade
ADMIN
 TruncateADMIN DropADMIN ListOnly returns those artifacts on which user has at least one of 
View/ListAny of READ, WRITE, EXECUTE, or ADMIN
Will be removedViewAt least one of 
(on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on application)Any privilege on the application or any of its descendants.
Get application detail Any privilege on the application or any of its descendants.

 

Dataset Modules

Programs

OperationPrivileges Required (Existing)Privileges Required (Proposed)
DeployWRITE
Start, Stop, or Debug(EXECUTE (on the
namespace
program)
WRITE
| EXECUTE (on the
namespace
application) |
ADMIN
EXECUTE (on the
module being deployed) | ADMIN
namespace)) & READ (on the namespace)EXECUTE
Delete
Set instancesADMIN (on the
dataset module
namespace) | ADMIN (on the application) | ADMIN (on the
namespace
program)
 Delete-all in the namespace
ADMIN
Set runtime argumentsADMIN
 
(on the namespace)
 ListOnly returns those artifacts on which user has at least one
| ADMIN (on the application) | ADMIN (on the program)
ADMIN
Retrieve runtime argumentsREAD (on the namespace) | READ (on the application) | READ (on the program)
READ | EXECUTE | ADMIN
Retrieve statusAny of READ, WRITE, EXECUTE, or ADMIN
Will be removed
 
View/List
At least one
Any of READ, WRITE, EXECUTE, or ADMIN 
Get program specification READ | WRITE | EXECUTE | ADMIN
Resume/Suspend schedule EXECUTE

 

Dataset Types

Datasets

OperationPrivileges RequiredPrivileges Required (Proposed)CreateWRITE
OperationPrivileges Required (Existing)Privileges Required (Proposed)
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removedViewAt least one of READ, WRITE, EXECUTE, or ADMIN 

 

Secure Keys

CreateWRITE (on the namespace)ADMIN
Read(READ (on the dataset) and READ (namespace)) | READ (on the namespace)READ
Retrieving propertiesNot DocumentedAny of READWRITEADMIN, or EXECUTE
WriteWRITE (on the dataset) | WRITE (on the namespace)WRITE
Update(ADMIN (on the dataset) and READ (on the namespace)) | (ADMIN (on the namespace)
WRITE
and READ (on the namespace)
|
)ADMIN
UpgradeADMIN (on the
key being created
dataset) | ADMIN (on the namespace)
Delete
ADMIN
 ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMIN ViewAt least one
TruncateADMIN (on the dataset) | ADMIN (on the namespace)ADMIN
DropADMIN (on the dataset) | ADMIN (on the namespace)ADMIN
View/ListAny of READ, WRITE, EXECUTE, or ADMIN 
Read
Get dataset meta READ
(on the namespace) | READ (on the key)
| WRITE | EXECUTE | ADMIN

 

Streams

Dataset Modules

OperationPrivileges Required (Existing)Privileges Required (Proposed)
Create
DeployWRITE (on the namespace)
WRITE 
ADMIN
DeleteADMIN (on the
namespace
dataset module) | ADMIN (on the
stream being created
namespace)
| ADMIN (on
ADMIN
Delete-all in the namespace
)READ
Retrieving events
ADMIN (on the
stream) & READ (on
namespace)ADMIN on all dataset modules in the namespace
)READ (on the stream) | READ (on namespace)Retrieving propertiesAt least one
View/ListAny of READ,
 
WRITE,
 ADMIN, or EXECUTEDrop-all in the namespaceADMIN
EXECUTE, or ADMIN 
Sending events to a stream (sync, async, or batch)(WRITE (on the stream) and READ (on the namespace)) | WRITE (on namespace & READ (on the namespace))WRITE (on the stream) | WRITE (on namespace) DropADMIN (on dataset) | ADMIN (on namespace) 
Get module meta READ | WRITE | EXECUTE | ADMIN

 

Dataset Types

OperationPrivileges Required (Existing)Privileges Required (Proposed)
View/ListAny of READ, WRITE, EXECUTE, or ADMIN 
Get dataset type meta READ | WRITE | EXECUTE | ADMIN

 

Secure Keys

OperationPrivileges Required (Existing)Privileges Required (Proposed)
CreateWRITE (on the namespace)
 
ADMIN
Update
DeleteADMIN
|  TruncateADMIN ListOnly returns those artifacts on which user has at least one At least one of READ, WRITE, EXECUTE, or ADMIN
(on the key) | ADMIN (on the namespace)ADMIN
View/ListAny of READ, WRITE, EXECUTE, or ADMIN 
View
ReadNot DocumentedREAD (on the key)

  

Streams

 

CDAP Sentry Extension Improvements

Existing Model

CDAP allows privileges to be defined using entities and users. Sentry is a RBAC which only allows privileges to be defined using roles and groups. CDAP is not aware of roles and groups hence every grant made on entity and user has to be translated into a grant on roles and group.

For this translation, CDAP does the following

  • Creates a proxy role per user and entity. This can lead to e x u roles being created where e is the number of entities and u is number of users.
  • Expects every user to belong to a unique group in Hadoop User/Group mapping. Today this group name is expected to be the same as the username and privileges for a user will be granted to the expected group name. However, a user belonging to a group named same as the username is not true in all environment. This leads to ineffective privileges being granted and user will not be able to access any entity using this privilege.

    Revoking all privileges on an entity is expensive since it needs listing of all privileges for all users. This is because sentry does not have a way to list all privileges for an entity. 

    Proposed Model

    Allow admins to use existing roles and groups in Sentry for authorization in CDAP. This means CDAP will not grant/revoke any privileges for entities. (note: this is a stretch goal for 4.3)

    However in cases where admin wants CDAP to grant privileges we propose the following model:

    1. Create a proxy role per user to which we will grant privileges for all entities. This limits the  number of roles created by CDAP in Sentry to u where u is the number of users.
    2. Create a proxy group per user to which we will grant the privileges. This removes the restriction of expecting a group with the same name as the username to be present and will work in all environments.

    Investigate the new API (listPrivilegsbyAuthorizable()) to list all privileges for a given entity so that we can avoid listing all privileges for all users during entity deletion.

    Backward Compatibility

    The above changes will be backward compatible with existing privileges.

    • Grant: All new grants will happen in the new format. 
    • Revoke: Revoke will happen in both old and new format.
    • Enforce: Enforce will work with both old and new privileges.
    • List: List will list both old and new privileges.

     

    Reduce CDAP Startup Time Due to Authorization

    Problem

    We have observed that as the number of entities in CDAP grows CDAP startup time increases due to authorization (more than 20 mins in some case). This happens because every time when CDAP starts for all system entities CDAP revokes and grant privileges all over again. Revoking all privileges on an entity is expensive since it requires listing all privileges for all users.

    Proposed Solution

    • CDAP system service access to system entities will bypass authorization. (https://issues.cask.co/browse/CDAP-11659)
    • AuthorizationEnforcer will always return true if requesting user is cdap and namespace is system.
    • Authorizer grant/revoke will be no-op for the above case

    Note: The underlying system will require to have appropriate permission for cdap.

     

    Existing Roles/Groups for Authorization

    Currently, CDAP always grants privileges on entity creation. Although this is a convenient feature it does not work well in enterprise environment. Lot of enterprises prefer to manage privileges in a centralized authorization provider (like sentry, ranger). This allow them to use existing role/groups to manage the privileges.

    • To support this we will introduce a property in cdap-site.xml which will specify whether CDAP should grant privileges on entity creation. By default CDAP will continue granting privileges on entity creation to maintain backward compatibility.
    • If an admin enables this feature CDAP will not grant/revoke privileges on an entity automatically. In this case the admin is responsible for creating the appropriate privileges.
    • Not all authorization providers (like sentry) have tools to manage privileges. CDAP will have to provide tools for admins to manage privileges using sentry. (Stretch goal, in 4.3 cdap-cli will be modified to allow creating privileges for non-existing entities)

     

    CDAP Ranger Integration

    TBD (We will add Ranger Integration design link soon).

     

    API changes

    New Programmatic APIs

    New Java APIs introduced (both user facing and internal)

    Deprecated Programmatic APIs

    New REST APIs

    PathMethodDescriptionResponse CodeResponse
    /v3/apps/<app-id>GETReturns the application spec for a given application

    200 - On success

    404 - When application is not available

    500 - Any internal errors

     

         

    Deprecated REST API

    PathMethodDescription
    /v3/apps/<app-id>GETReturns the application spec for a given application

    CLI Impact or Changes

    • Impact #1
    • Impact #2
    • Impact #3

    UI Impact or Changes

    • Impact #1
    • Impact #2
    • Impact #3

    Security Impact 

    What's the impact on Authorization and how does the design take care of this aspect

    Impact on Infrastructure Outages 

    System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect
    OperationPrivileges Required (Existing)Privileges Required (Proposed)
    CreateWRITE (on the namespace)ADMIN
    Retrieving eventsREAD (on the stream) & READ (on the namespace)READ
    Retrieving propertiesAny of READWRITEADMIN, or EXECUTE 
    Sending events to a stream (sync, async, or batch)(WRITE (on the stream) and READ (on the namespace)) | WRITE (on namespace & READ (on the namespace))WRITE
    DropADMIN (on stream) | ADMIN (on namespace)ADMIN
    Drop-all in the namespaceADMIN (on the namespace) | ADMIN (on the stream)ADMIN on all the streams in the namespace
    UpdateADMIN (on the namespace) | ADMIN (on the stream)ADMIN
    TruncateADMIN (on the namespace) | ADMIN (on the stream)ADMIN
    View/ListAny of READ, WRITE, EXECUTE, or ADMIN 
    Get stream property READ | WRITE | EXECUTE | ADMIN

     

    Principal

    OperationPrivileges Required (Existing)Privileges Required (Proposed)
    Deploy an app to impersonate a principal ADMIN
    Create a namespace with owner prinicpal ADMIN
    Create a dataset with owner prinicpal ADMIN
    Create a stream with owner prinicpal ADMIN

    Open Questions

    1. How does authorization on CDAP system actions (like increasing instances of metrics processor, etc) happen?

    CDAP Sentry Extension Improvements

    Existing Model

    CDAP allows privileges to be defined using entities and users. Sentry only allows privileges to be defined using roles and groups. CDAP is not aware of roles and groups, hence every grant made on entity and user has to be translated into a grant on roles and group.

    For this translation, CDAP does the following

    1. Creates a proxy role per user and entity. This can lead to e x u roles being created, where e is the number of entities and u is number of users.
    2. Expects every user (say alice) to belong to a unique group in Hadoop User/Group mapping (group alice). Today this group name is expected to be the same as the username, and privileges for a user will be granted to that group. However, a user belonging to a group named same as the username is not true in all environments. This makes the privileges granted to the user ineffective during enforcement, and the user will not be able to access entities using these privileges.

    In addition, revoking all privileges on an entity is expensive since it involves listing of all privileges for all users. This is because Sentry does not have an API to list all privileges for an entity. 

    Proposed Model

    Allow admins to use existing roles and groups in Sentry for authorization in CDAP. This means CDAP will not grant/revoke any privileges for entities. (note: this is a stretch goal for 4.3)

    However in cases where an admin wants CDAP to grant privileges we propose the following model:

    1. Create a proxy role per user to which CDAP will grant privileges for all entities associated with the user. This limits the  number of roles created by CDAP in Sentry to u, where u is the number of users.
    2. Create a proxy group per user to which CDAP will grant the privileges. This removes the restriction of expecting a group with the same name as the username to be present, and will work in all environments. The proxy group so created will not be added to Hadoop User/Group mapping, and will only be part of Sentry privileges.

    Investigate the new Sentry API (listPrivilegsbyAuthorizable) to list all privileges for a given entity so that we can avoid listing all privileges for all users during an entity deletion.

    Backwards Compatibility

    The above changes will be backward compatible with existing privileges.

    • Grant: All new grants will happen in the new format. 
    • Revoke: Revoke will happen in both old and new format.
    • Enforce: Enforce will work with both old and new privileges.
    • List: List will list both old and new privileges.

    Reduce CDAP Startup Time Due to Authorization

    Problem

    We have observed that as the number of entities in CDAP grow, CDAP startup time increases due to authorization (more than 20 mins in some cases). During CDAP startup, CDAP revokes and grant privileges on all system entities. Revoking all privileges on an entity is expensive since it requires listing all privileges for all users.

    Proposed Solution

    • cdap's access to system entities will bypass authorization (https://issues.cask.co/browse/CDAP-11659)
    • AuthorizationEnforcer will always return true if requesting user is cdap and namespace is system.
    • Authorizer grant/revoke will be no-op for the above case

    Note: The underlying systems are still required to have appropriate permissions for cdap.

    Use Existing Roles/Groups for Authorization

    Currently, CDAP always grants/revokes privileges on an entity creation/deletion. Although this is a convenient feature, it does not work well in enterprise environments. Many enterprises prefer to manage privileges in a centralized authorization provider (like Sentry or Ranger). This will allow them to use existing role/groups to manage the privileges across all systems.

    • To support this we will introduce a property in cdap-site.xml which will specify whether CDAP should automatically grant privileges on entity creation. By default CDAP will continue granting privileges on entity creation to maintain backwards compatibility.
    • If an admin disables this feature, CDAP will not grant/revoke privileges on an entity automatically. In this case the admin is responsible for creating the appropriate privileges.
    • Not all authorization providers (like Sentry) have tools to manage privileges. CDAP will have to provide tools for admins to manage privileges using Sentry (stretch goal for 4.3, in 4.3 cdap-cli will be modified to allow creating privileges for non-existing entities - as what user will cdap-cli grant these privileges?).

    CDAP Ranger Integration

    Please see Ranger Integration Design Document

    CLI Impact or Changes

    • CLI will be modified to not check for entity existence while granting privileges.

    Test Scenarios

     
    Test IDTest DescriptionExpected Results
       
      
       
       

    Releases

    Release X.Y.Z

    Release X.Y.Z

    Related Work

  • Work #1
  • Work #2
  • Work #3
     

    Future work