...
- As a CDAP system, I should be able to integrate with Apache Sentry for fine-grained role-based access controls of select CDAP operations
- As a CDAP admin, I should be able to easily configure Sentry to work with CDAP on different type of cluster (ex: CDH, CM cluster etc).
- As a CDAP admin, I should be able to create/update/delete roles in Apache Sentry
- As a CDAP admin, I should be able to add users/groups to roles in Apache Sentry
- As a CDAP admin, I should be able to turn authorization on/off easily for entire CDAP instance
- As a CDAP system, I should be able to authorize the following requests
- Namespace create/update/delete
- Application deployment
- Program start/stop
- Stream read/writewrite (Not Implemented in 3.4)
These operations are a subset that represents the various 'kinds' of operations allowed in CDAP
...
Entity | Operation | Required Privileges | Resultant Privileges |
---|---|---|---|
Namespace | create | ADMIN (Instance) | ADMIN (Namespace) |
update | ADMIN (Namespace) | ||
list | READ (Instance) | ||
get | READ (Namespace) | ||
delete | ADMIN (Namespace) | ||
set preference | WRITE (Namespace) | ||
get preference | READ (Namespace) | ||
search | READ (Namespace) | ||
Artifact | add | WRITE (Namespace) | ADMIN (Artifact) |
delete | ADMIN (Artifact) | ||
get | READ (Artifact) | ||
list | READ (Namespace) | ||
write property | ADMIN (Artifact) | ||
delete property | ADMIN (Artifact) | ||
get property | READ (Artifact) | ||
refresh | WRITE (Instance) | ||
write metadata | ADMIN (Artifact) | ||
read metadata | READ (Artifact) | ||
Application | deploy | WRITE (Namespace) | ADMIN (Application) |
get | READ (Application) | ||
list | READ (Namespace) | ||
update | ADMIN (Application) | ||
delete | ADMIN (Application) | ||
set preference | WRITE (Application) | ||
get preference | READ (Application) | ||
add metadata | ADMIN (Application) | ||
get metadata | READ (Application) | ||
Programs | start/stop/debug | EXECUTE (Program) | |
set instances | ADMIN (Program) | ||
list | READ (Namespace) | ||
set runtime args | EXECUTE (Program) | ||
get runtime args | READ (Program) | ||
get instances | READ (Program) | ||
set preference | ADMIN (Program) | ||
get preference | READ (Program) | ||
get status | READ (Program) | ||
get history | READ (Program) | ||
add metadata | ADMIN (Program) | ||
get metadata | READ (Program) | ||
emit logs | WRITE (Program) | ||
view logs | READ (Program) | ||
emit metrics | WRITE (Program) | ||
view metrics | READ (Program) | ||
Streams | create | WRITE (Namespace) | ADMIN (Stream) |
update properties | ADMIN (Stream) | ||
delete | ADMIN (Stream) | ||
truncate | ADMIN (Stream) | ||
enqueue asyncEnqueue batch | WRITE (Stream) | ||
get | READ (Stream) | ||
list | READ (Namespace) | ||
read events | READ (Stream) | ||
set preferences | ADMIN (Stream) | ||
get preferences | READ (Stream) | ||
add metadata | ADMIN (Stream) | ||
get metadata | READ (Stream) | ||
view lineage | READ (Stream) | ||
emit metrics | WRITE (Stream) | ||
view metrics | READ (Stream) | ||
Datasets | list | READ (Namespace) | |
get | READ (Dataset) | ||
create | WRITE (Namespace) | ADMIN (Dataset) | |
update | ADMIN (Dataset) | ||
drop | ADMIN (Dataset) | ||
executeAdmin (exists/truncate/upgrade) | ADMIN (Dataset) | ||
add metadata | ADMIN (Dataset) | ||
get metadata | READ (Dataset) | ||
view lineage | READ (Dataset) | ||
emit metrics | WRITE (Dataset) | ||
view metrics | READ (Dataset) |
...
Code Block | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
| ||||||||||
public interface Authorizer extends{ AuthEnforcer { /** * Initialize *the Grants a principal authorization to perform a set of actions on an entity. * * @param entity the entity on which an action is being performed * @param principal the Principal that performs the actions. This could be a user, group or a role * @param actions the set of actions to grant */ void grant(EntityId entity, Principal principal, Set<Action> actions); /** * Grants a Principal authorization to perform all actions on an entity. * {@link Authorizer}. Authorization extensions can use this method to access an * {@link AuthorizationContext} that allows them to interact with CDAP for operations such as creating and accessing * datasets, executing dataset operations in transactions, etc. * * @param context the {@link AuthorizationContext} that can be used to interact with CDAP */ void initialize(AuthorizationContext context) throws Exception; /** * Enforces authorization for the specified {@link Principal} for the specified {@link Action} on the specified * {@link EntityId}. * * @param entity the {@link entityEntityId} on which anauthorization action is beingto performedbe enforced * @param principal the {@link Principal} that performs the actions. This could be a* user,@param groupaction orthe a{@link roleAction} being performed */ @throws UnauthorizedException if the void grant(EntityId entity, Principal principal); /** principal is not authorized to perform action on the entity * Revokes@throws aException principal'sif authorizationany toother performerrors aoccurred setwhile ofperforming actions on an entity. * the authorization enforcement check */ void enforce(EntityId entity, Principal principal, Action action) throws Exception; /** * Grants a {@link Principal} authorization to perform a set of {@link Action actions} on an {@link EntityId}. * * @param entity the {@link EntityId} to whom {@link Action actions} are to be granted * @param principal the {@link Principal} that performs the actions. This could be a user, or role * @param actions the set of {@link Action actions} to grant. */ void grant(EntityId entity, Principal principal, Set<Action> actions) throws Exception; /** * Revokes a {@link Principal principal's} authorization to perform a set of {@link Action actions} on * an {@link EntityId}. * * @param entity the {@link EntityId} whose {@link Action actions} are to be revoked * @param principal the {@link Principal} that performs the actions. This could be a user, group or role * @param actions the set of {@link Action actions} to revoke */ void revoke(EntityId entity, Principal principal, Set<Action> actions) throws Exception; /** * Revokes all {@link Principal principals'} authorization to perform any {@link Action} on the given * {@link EntityId}. * * @param entity the {@link EntityId} on which all {@link Action actions} are to be revoked */ void revoke(EntityId entity) throws Exception; /** * Returns all the {@link Privilege} for the specified {@link Principal}. * * @param principal the {@link Principal} for which to return privileges * @return a {@link Set} of {@link Privilege} for the specified principal */ Set<Privilege> listPrivileges(Principal principal) throws Exception; /********************************* Role Management: APIs for Role Based Access Control ******************************/ /** * Create a role. * * @param role the {@link Role} to create * @throws RoleAlreadyExistsException if the the role to be created already exists */ void createRole(Role role) throws Exception; /** * Drop a role. * * @param role the {@link Role} to drop * @throws RoleNotFoundException if the role to be dropped is not found */ void dropRole(Role role) throws Exception; /** * Add a role to the specified {@link Principal}. * * @param role the {@link Role} to add to the specified group * @param principal the {@link Principal} to add the role to * @throws RoleNotFoundException if the role to be added to the principals is not found */ void addRoleToPrincipal(Role role, Principal principal) throws Exception; /** * Delete a role from the specified {@link Principal}. * * @param entityrole the entity{@link onRole} whichto anremove actionfrom isthe beingspecified performedgroup * @param principal the principal that performs the actions. This could be a user, group or a role {@link Principal} to remove the role from * *@throws @paramRoleNotFoundException actionsif the setrole to ofbe actionsremoved to revokethe permissionsprincipals onis not found */ void revokeremoveRoleFromPrincipal(EntityIdRole entityrole, Principal principal,) Set<Action>throws actions)Exception; /** * Returns *a Revokesset aof principal's authorization to perform any action on an entity. * all {@link Role roles} for the specified {@link Principal}. * * @param principal the {@link Principal} to look up roles for * @param@return entitySet theof entity{@link onRole} whichfor anthe actionspecified is{@link beingPrincipal} performed */ *Set<Role> @paramlistRoles(Principal principal) thethrows principalException; that performs the actions./** This could be a* user,Returns groupall oravailable a role */ void revoke(EntityId entity, Principal principal);{@link Role}. Only a super user can perform this operation. * * @return /**a set of all available {@link *Role} Revokesin allthe principals'system. authorization to perform any*/ action on an entity. Set<Role> listAllRoles() throws Exception; /** * Destroys *an @param entity the entity on which an action is being performed {@link Authorizer}. Authorization extensions can use this method to write any cleanup code. */ void revokedestroy(EntityId entity)) throws Exception; } |
Where Principal
is the entity performing actions defined as below:
...
Entity | Sentry Resource URI |
---|---|
Instance | cdap:///instance=server1 |
Namespace | cdap:///instance=server1/namespace=ns1 |
Artifact | cdap:///instance=server1/namespace=ns1/artifact=art1art/artifactVersion=1 |
Application |
|
Program | cdap:///instance=server1/namespace=ns1/application=app1/programType=pt1/programName=prg1 |
Dataset | cdap:///instance=server1/namespace=ns1/dataset=ds1 |
Stream | cdap:///instance=server1/namespace=ns1/stream=s1 |
...
Property | Description | Value |
---|---|---|
sentry.service.allow.connect | List of users allowed to connect to the Sentry Server | cdap will be added to this list |
sentry.cdap.provider | Authorization provider for the CDAP component in Sentry. This class defines the user-group mapping amongst other things. | org.apache.sentry.provider.common. HadoopGroupResourceAuthorizationProvider |
sentry.cdap.provider.resource | The resource for creating the Sentry Provider Backend. This property seems unused, and always defaults to "". However, all data engines (hive, sqoop, kafka define it). | "" |
sentry.cdap.provider.backend | A class that implements ProviderBackend . This class uses a SentryServiceClient to communicate with the sentry service from the client side in Sentry. | org.apache.sentry.provider.db.generic.SentryGenericProviderBackend |
sentry.cdap.policy.engine | Defines the Sentry Policy Engine for the cdap component. Must implement org.apache.sentry.policy.common.PolicyEngine |
(package name subject to change) |
sentry.cdap.instance.name | Defines the instance name for the cdap component. | cdap |
CDAP
These properties will be defined in cdap-security.xml
Property | Description | Default |
---|---|---|
security.authorization.enabled | Determines whether authorization should be enabled in CDAP. If false, a NoOpAuthorizer would be used for security.authorizer.class | false |
security.authorizer.class | Fully qualified class name of the authorizer class. Must implement the Authorizer interface | co.cask.cdap.security.authorization.DatasetBasedAuthorizer |
instance.name | Defines the instance name for the cdap component. | cdap.security.authorization.DatasetBasedAuthorizer |
Role Management
To support RBAC (Role Based Access Control) such as Apache Sentry we will need to support role management through CDAP.
...
Although supporting the Sentry Shell seems straightforward once the CDAP backend for Sentry is implemented, it's a relatively new feature added in Sentry 1.7 (SENTRY-749). CDH 5.5 ships Sentry 1.5 .5 ships Sentry 1.5 and there are no timelines on support for Sentry 1.7 (Cloudera Maven Repository).and there are no timelines on support for Sentry 1.7 (Cloudera Maven Repository).
After some digging we found out that SentryShell is hardcoded to use work with Hive and it works only with Hive. At the moment of this writing, Kafka is added support for SentryShell by making a copy for Hive's SentryShell. This seems to be the norm in Sentry for Shell support since there is no generic Shell which can be used by the services being integrated to Sentry. Unless we have some strong reason we should avoid having support for CDAP through SentryShell, specially since we are already working on supporting ACL management for CDAP in Sentry through Hue. See below.
For recognizing and listing CDAP entities in Hue, we will have to implement a CDAP Webapp for Hue. Hue is implemented entirely in Python using the Django framework. This integration is a risk for 3.4. More details on this TBD.
Testing
For testing the sentry integration, there are a couple of approaches. We can use the file-based policy store in Apache Sentry for tests. However, to simulate more realistic scenarios, we should explore if it is easy to setup an in-memory database (HSQL, etc) with the Sentry schema in tests.
...