Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents

Checklist

  •  User Stories Documented
  •  User Stories Reviewed
  •  Design Reviewed
  •  APIs reviewed
  •  Release priorities assigned
  •  Test cases reviewed
  •  Blog post

Goals

  • Make CDAP authorization policy consistent across all entity types.
  • Allow admins to set granular privileges on entities. 
  • Ranger integration for CDAP authorization
  • Improve Sentry data model to fix existing issues seen on customer environments
  • Allow admins to use existing role/groups for authorization

User Stories 

  • TBD

Design

CDAP Authorization Policy

Existing CDAP Authorization Policy

The existing CDAP Authorization policy has the following limitations:

  • Granular privileges

    • Cannot grant a privilege to a user to read only one dataset or one stream in a namespace.
    • Cannot grant a privilege to a user to deploy/create an application/artifact/dataset/stream without granting WRITE on the namespace.
    • Cannot grant a privilege to a user to start/stop a program without granting READ on the namespace.
  • Visibility
    • User who has a privilege on a program cannot see the program in the UI or CLI if the user does not have any privilege on the namespace. 
  • Inconsistency
    • To write to a dataset user needs to have WRITE privilege on the dataset, but to write to a stream user needs to have both WRITE on the the stream and READ on the namespace.
    • ADMIN on an entity allows the user to delete the entity, whereas ADMIN on an entity does not allow user to create it.
    • Dataset read needs namespace READ, but dataset write does not need namespace WRITE.
  • Redundancy
    • Dataset READ and stream READ are redundant because they need namespace READ permission to be useful, and once a user has namespace READ the user can read all datasets and streams in the namespace.
    • List and View operations are equivalent but are listed separately in documentation.

Overview of the Proposed Authorization Policy

The proposed CDAP Authorization policy can be defined by the following three principles:

  1. Access: 

    • Access defines who can perform an action (READ, WRITE, EXECUTE, ADMIN) on an entity. 

    • Access flows top-down i.e. if an user has READ on namespace it implies that the user has READ on all entities inside the namespace. 

  2. Visibility

    • Visibility defines whether an entity is visible to a user or not.
    • If a user has any privilege on an entity, it is visible to the user.
    • Visibility flows bottom-up i.e. if a user has any privilege on a program then the user will be able to see the application that contains the program and namespace that contains the application.
  3. Grant

    • Grant is defined as action of giving a privilege on an entity to a user.
    • To grant privileges on an entity, ADMIN on the entity is required.
    • Grant flows top-down i.e. if a user has ADMIN on namespace then the user can grant privileges on all entities inside the namespace.

Note: CDAP Instance is not part of the privilege hierarchy.

Decouple entity existence from privilege

In addition, CDAP will now support creating privileges for entities that are yet to be created. This will allow admins to grant fine grained privileges on entities. For example, an admin can grant a user ADMIN on an application before the application is deployed. This will allow the user to deploy only this specific application without having any other access to the namespace.

Changes to the authorization matrix

Instance

ADMIN on an Instance allows user to create Namespaces in the instance. No other operations are defined as of now. Also Instance is not a part of privilege hierarchy.

Namespaces

OperationPrivileges Required (Existing)Privileges Required (Proposed)
CreateWRITE (on the CDAP instance)

WRITE (on the CDAP instance) | ADMIN (on the namespace)

UpdateADMIN (on the namespace) 
DeleteADMIN (on the namespace) 
ListOnly returns those namespaces on which user has at least one of READ, WRITE, EXECUTE, or ADMIN Will be removed
ViewAny of READ, WRITE, EXECUTE, or ADMINAny of READ, WRITE, EXECUTE, or ADMIN on the namespace or any of its descendants.
Grant ADMIN (on the namespace)

Artifacts

OperationPrivileges RequiredPrivileges Required (Proposed)
AddWRITE (on the namespace)WRITE (on the namespace) | ADMIN (on the namespace) | ADMIN (on the artifact being deployed)
Add a propertyADMIN (on namespace) | ADMIN (on artifact) 
Remove a propertyADMIN (on namespace) | ADMIN (on artifact) 
DeleteADMIN (on namespace) | ADMIN (on artifact) 
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAny of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on artifact) 
Grant ADMIN (on the namespace) | ADMIN (on the artifact)

 

Applications

OperationPrivileges RequiredPrivileges Required (Proposed)
AddWRITE (on the namespace) and READ (on the artifact if deployed from an artifact)

WRITE (on the namespace) | ADMIN (namespace) | ADMIN (application)

  •  READ (on namespace) | READ (on the artifact): If application is being deployed from an existing artifact
DeleteADMIN (on the application) | ADMIN (on the namespace)  
ListOnly returns those applications on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAny of READ, WRITE, EXECUTE, or ADMIN (on namespace) | Any of READ, WRITE, EXECUTE, or ADMIN (on application)Any of READ, WRITE, EXECUTE, or ADMIN on the namespace or application or any of its descendants.
Grant ADMIN (on the namespace) | ADMIN (on the application)

 

Programs

OperationPrivileges RequiredPrivileges Required (Proposed)
Start, Stop, or Debug(EXECUTE (on the program) | EXECUTE (on the application) | EXECUTE (on the namespace)) & READ (on the namespace)EXECUTE (on the program) |  | EXECUTE (on the application) | EXECUTE (on the namespace)
Set instancesADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program) 
Set runtime argumentsADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program)
 
Retrieve runtime argumentsREAD (on the namespace) | READ (on the application) | READ (on the program)
 
Retrieve statusAny of READ, WRITE, EXECUTE, or ADMIN 
ListOnly returns those programs on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAny of READ, WRITE, EXECUTE, or ADMIN 
Grant ADMIN (on the namespace) | ADMIN (on the application) | ADMIN (on the program)

 

Datasets

OperationPrivileges RequiredPrivileges Required (Proposed)
CreateWRITE (on the namespace)WRITE (on the namespace) | ADMIN (on the dataset being created) | ADMIN (on the namespace)
Read(READ (on the dataset) and READ (namespace)) | READ (on the namespace)READ (on the namespace) | READ (on the the dataset)
Retrieving propertiesNot DocumentedAny of READWRITEADMIN, or EXECUTE
WriteWRITE (on the dataset) | WRITE (on the namespace)WRITE (on the the namespace) | WRITE (on the the dataset)
Update(ADMIN (on the dataset) and READ (on the namespace)) | (ADMIN (on the namespace) and READ (on the namespace))ADMIN (on the dataset) | ADMIN (on the namespace)
UpgradeADMIN (on the dataset) | ADMIN (on the namespace) 
TruncateADMIN (on the dataset) | ADMIN (on the namespace) 
DropADMIN (on the dataset) | ADMIN (on the namespace) 
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAny of READ, WRITE, EXECUTE, or ADMIN 
Grant ADMIN (on the namespace) | ADMIN (on the Dataset)

 

Dataset Modules

OperationPrivileges RequiredPrivileges Required (Proposed)
DeployWRITE (on the namespace)WRITE (on the namespace) | ADMIN (on the module being deployed) | ADMIN (on the namespace)
DeleteADMIN (on the dataset module) | ADMIN (on the namespace) 
Delete-all in the namespaceADMIN (on the namespace) 
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAny of READ, WRITE, EXECUTE, or ADMIN 
Grant ADMIN (on the namespace) | ADMIN (on the Dataset module)

 

Dataset Types

OperationPrivileges RequiredPrivileges Required (Proposed)
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINWill be removed
ViewAny of READ, WRITE, EXECUTE, or ADMIN 

 

Secure Keys

OperationPrivileges RequiredPrivileges Required (Proposed)
CreateWRITE (on the namespace)WRITE (on the namespace) | ADMIN (on the key being created) | ADMIN (on the namespace)
DeleteADMIN (on the key) | ADMIN (on the namespace) 
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINwill be removed
ViewAny of READ, WRITE, EXECUTE, or ADMIN 
Read READ (on the namespace) | READ (on the key)
Grant ADMIN (on the key) | ADMIN (on the namespace)

 

Streams

OperationPrivileges RequiredPrivileges Required (Proposed)
CreateWRITE (on the namespace)WRITE (on the namespace) | ADMIN (on the stream being created) | ADMIN (on the namespace)
Retrieving eventsREAD (on the stream) & READ (on the namespace)READ (on the stream) | READ (on namespace)
Retrieving propertiesAny of READWRITEADMIN, or EXECUTE 
Sending events to a stream (sync, async, or batch)(WRITE (on the stream) and READ (on the namespace)) | WRITE (on namespace & READ (on the namespace))WRITE (on the stream) | WRITE (on namespace)
DropADMIN (on stream) | ADMIN (on namespace) 
Drop-all in the namespaceADMIN (on the namespace) | ADMIN (on the stream) 
UpdateADMIN (on the namespace) | ADMIN (on the stream) 
TruncateADMIN (on the namespace) | ADMIN (on the stream) 
ListOnly returns those artifacts on which user has at least one of READ, WRITE, EXECUTE, or ADMINwill be removed
ViewAny of READ, WRITE, EXECUTE, or ADMIN 
Grant ADMIN (on stream) | ADMIN (on namespace)

 

CDAP Sentry Extension Improvements

Existing Model

CDAP allows privileges to be defined using entities and users. Sentry only allows privileges to be defined using roles and groups. CDAP is not aware of roles and groups, hence every grant made on entity and user has to be translated into a grant on roles and group.

For this translation, CDAP does the following

  1. Creates a proxy role per user and entity. This can lead to e x u roles being created, where e is the number of entities and u is number of users.
  2. Expects every user (say alice) to belong to a unique group in Hadoop User/Group mapping (group alice). Today this group name is expected to be the same as the username, and privileges for a user will be granted to that group. However, a user belonging to a group named same as the username is not true in all environments. This makes the privileges granted to the user ineffective during enforcement, and the user will not be able to access entities using these privileges.

In addition, revoking all privileges on an entity is expensive since it involves listing of all privileges for all users. This is because Sentry does not have an API to list all privileges for an entity. 

Proposed Model

Allow admins to use existing roles and groups in Sentry for authorization in CDAP. This means CDAP will not grant/revoke any privileges for entities. (note: this is a stretch goal for 4.3)

However in cases where an admin wants CDAP to grant privileges we propose the following model:

  1. Create a proxy role per user to which CDAP will grant privileges for all entities associated with the user. This limits the  number of roles created by CDAP in Sentry to u, where u is the number of users.
  2. Create a proxy group per user to which CDAP will grant the privileges. This removes the restriction of expecting a group with the same name as the username to be present, and will work in all environments. The proxy group so created will not be added to Hadoop User/Group mapping, and will only be part of Sentry privileges.

Investigate the new Sentry API (listPrivilegsbyAuthorizable) to list all privileges for a given entity so that we can avoid listing all privileges for all users during an entity deletion.

Backwards Compatibility

The above changes will be backward compatible with existing privileges.

  • Grant: All new grants will happen in the new format. 
  • Revoke: Revoke will happen in both old and new format.
  • Enforce: Enforce will work with both old and new privileges.
  • List: List will list both old and new privileges.

Reduce CDAP Startup Time Due to Authorization

Problem

We have observed that as the number of entities in CDAP grow, CDAP startup time increases due to authorization (more than 20 mins in some cases). During CDAP startup, CDAP revokes and grant privileges on all system entities. Revoking all privileges on an entity is expensive since it requires listing all privileges for all users.

Proposed Solution

  • cdap's access to system entities will bypass authorization (https://issues.cask.co/browse/CDAP-11659)
  • AuthorizationEnforcer will always return true if requesting user is cdap and namespace is system.
  • Authorizer grant/revoke will be no-op for the above case

Note: The underlying systems are still required to have appropriate permissions for cdap.

Use Existing Roles/Groups for Authorization

Currently, CDAP always grants/revokes privileges on an entity creation/deletion. Although this is a convenient feature, it does not work well in enterprise environments. Many enterprises prefer to manage privileges in a centralized authorization provider (like Sentry or Ranger). This will allow them to use existing role/groups to manage the privileges across all systems.

  • To support this we will introduce a property in cdap-site.xml which will specify whether CDAP should automatically grant privileges on entity creation. By default CDAP will continue granting privileges on entity creation to maintain backwards compatibility.
  • If an admin disables this feature, CDAP will not grant/revoke privileges on an entity automatically. In this case the admin is responsible for creating the appropriate privileges.
  • Not all authorization providers (like Sentry) have tools to manage privileges. CDAP will have to provide tools for admins to manage privileges using Sentry (stretch goal for 4.3, in 4.3 cdap-cli will be modified to allow creating privileges for non-existing entities).

CDAP Ranger Integration

TBD (We will add Ranger Integration design link soon).

 

API changes

New Programmatic APIs

New Java APIs introduced (both user facing and internal)

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionResponse CodeResponse
/v3/apps/<app-id>GETReturns the application spec for a given application

200 - On success

404 - When application is not available

500 - Any internal errors

 

     

Deprecated REST API

PathMethodDescription
/v3/apps/<app-id>GETReturns the application spec for a given application

CLI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

UI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results
   
   
   
   

Releases

Release X.Y.Z

Release X.Y.Z

Related Work

  • Work #1
  • Work #2
  • Work #3

 

Future work