Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  •  User stories documented (Rohit/Bhooshan)
  •  User stories reviewed (Nitin)
  •  Design documented (Rohit/Bhooshan)
  •  Design reviewed (Andreas)
  •  Feature merged (Rohit/Bhooshan)
  •  Examples and guides (Rohit)
  •  Integration tests (Bhooshan) 
  •  Documentation for feature (Rohit/Bhooshan)
  •  Blog post 

...

  • As a CDAP system, I should be able to integrate with Apache Sentry for fine-grained role-based access controls of select CDAP operations 
  • As a CDAP admin, I should be able to easily install and configure Apache Sentry to work with CDAP on different type of cluster (ex: CDH, CM cluster etc). 
  • As a CDAP admin, I should be able to create/update/delete roles in Apache Sentry
  • As a CDAP admin, I should be able to add users/groups to roles in Apache Sentry
  • As a CDAP admin, I should be able to turn authorization on/off easily for entire CDAP instance
  • As a CDAP system, I should be able to authorize the following requests
    • Namespace create/update/delete
    • Application deployment
    • Program start/stop
    • Stream read/writewrite  (Not Implemented in 3.4)
      These operations are a subset that represents the various 'kinds' of operations allowed in CDAP

...

EntityOperationRequired PrivilegesResultant Privileges
NamespacecreateADMIN (Instance)ADMIN (Namespace)
 updateADMIN (Namespace) 
 listREAD (Instance) 
 getREAD (Namespace) 
 deleteADMIN (Namespace) 
 set preferenceWRITE (Namespace) 
 get preferenceREAD (Namespace) 
 searchREAD (Namespace) 
ArtifactaddWRITE (Namespace)ADMIN (Artifact)
 deleteADMIN (Artifact) 
 getREAD (Artifact) 
 listREAD (Namespace) 
 write propertyADMIN (Artifact) 
 delete propertyADMIN (Artifact) 
 get propertyREAD (Artifact) 
 refreshWRITE (Instance) 
 write metadataADMIN (Artifact) 
 read metadataREAD (Artifact) 
ApplicationdeployWRITE (Namespace)ADMIN (Application)
 getREAD (Application) 
 listREAD (Namespace) 
 updateADMIN (Application) 
 deleteADMIN (Application) 
 set preferenceWRITE (Application) 
 get preferenceREAD (Application) 
 add metadataADMIN (Application) 
 get metadataREAD (Application) 
Programsstart/stop/debugEXECUTE (Program) 
 set instancesADMIN (Program) 
 listREAD (Namespace) 
 set runtime argsEXECUTE (Program) 
 get runtime argsREAD (Program) 
 get instancesREAD (Program) 
 set preferenceADMIN (Program) 
 get preferenceREAD (Program) 
 get statusREAD (Program) 
 get historyREAD (Program) 
 add metadataADMIN (Program) 
 get metadataREAD (Program) 
 emit logsWRITE (question) (Program) 
 view logsREAD (Program) 
 emit metricsWRITE (question) (Program) 
 view metricsREAD (Program) 
StreamscreateWRITE (Namespace)ADMIN (Stream)
 update propertiesADMIN (Stream) 
 deleteADMIN (Stream) 
 truncateADMIN (Stream) 
 enqueue
asyncEnqueue
batch
WRITE (Stream) 
 getREAD (Stream) 
 listREAD (Namespace) 
 read eventsREAD (Stream) 
 set preferencesADMIN (Stream) 
 get preferencesREAD (Stream) 
 add metadataADMIN (Stream) 
 get metadataREAD (Stream) 
 view lineageREAD (Stream) 
 emit metricsWRITE (question) (Stream) 
 view metricsREAD (Stream) 
DatasetslistREAD (Namespace) 
 getREAD (Dataset) 
 createWRITE (Namespace)ADMIN (Dataset)
 updateADMIN (Dataset) 
 dropADMIN (Dataset) 
 executeAdmin (exists/truncate/upgrade)ADMIN (Dataset) 
 add metadataADMIN (Dataset) 
 get metadataREAD (Dataset) 
 view lineageREAD (Dataset) 
 emit metricsWRITE (question) (Dataset) 
 view metricsREAD (Dataset) 

...

Code Block
themeConfluence
languagejava
titleAuthChecker Interface
firstline1
linenumberstrue
interface AuthEnforcer {
	/**
     * Enforces authorization for the specified {@link Principal} for the specified {@link Action} on the specified {@link EntityId}.
     *
     * @param principal the principal that performs the actions. This could be a user, group or a role
     * @param entity the entity on which an action is being performed
     * @param action the action being performed
     * @throws AuthorizationException if the principal is not authorized to perform action on the entity
     */
	void enforce(Principal Principalprincipal, EntityId entity, Action action) throws AuthorizationException;
}

...

Code Block
themeConfluence
languagejava
titleAuthorizer Interface
firstline1
linenumberstrue
public interface Authorizer extends{
AuthEnforcer { 	/**
   * Initialize *the Grants a principal authorization to perform a set of actions on an entity.
     *
     * @param entity the entity on which an action is being performed
     * @param principal the Principal that performs the actions. This could be a user, group or a role
     * @param actions the set of actions to grant
     */
    void grant(EntityId entity, Principal principal, Set<Action> actions);

	/**
     * Grants a Principal authorization to perform all actions on an entity.
     *
     * @param entity the entity on which an action is being performed
     {@link Authorizer}. Authorization extensions can use this method to access an
   * {@link AuthorizationContext} that allows them to interact with CDAP for operations such as creating and accessing
   * datasets, executing dataset operations in transactions, etc.
   *
   * @param context the {@link AuthorizationContext} that can be used to interact with CDAP
   */
  void initialize(AuthorizationContext context) throws Exception;

  /**
   * Enforces authorization for the specified {@link Principal} for the specified {@link Action} on the specified
   * {@link EntityId}.
   *
   * @param entity the {@link EntityId} on which authorization is to be enforced
   * @param principal the {@link Principal} that performs the actions.
This could be a* user,@param groupaction orthe a{@link roleAction} being performed
   */ @throws UnauthorizedException if the voidprincipal grant(EntityId entity, Principal principal, Set<Action> actions);
	/**
 is not authorized to perform action on the entity
   * Revokes@throws aException principal'sif authorizationany toother performerrors aoccurred setwhile ofperforming actionsthe onauthorization an entity.
 enforcement check
   */
  void   * @param entity the entity on which an action is being performed
enforce(EntityId entity, Principal principal, Action action) throws Exception;

  /**
   * * @param principal the principal that performs the actions. This could be a user, group or a role
 Grants a {@link Principal} authorization to perform a set of {@link Action actions} on an {@link EntityId}.
   *
   * @param actionsentity the set of actions{@link EntityId} to revokewhom permissions{@link onAction actions} are to be granted
*/   * @param voidprincipal revoke(EntityId entity,the {@link Principal} that principal,performs Set<Action>the actions);. This 	/**could be a user, or role
* Revokes a principal's* authorization@param toactions performthe anyset actionof on{@link anAction entity.actions} to grant.
   */
  void grant(EntityId entity, *Principal @paramprincipal, entitySet<Action> theactions) entitythrows onException;
which
an action is/**
being performed  * Revokes a {@link *Principal @param principal's} theauthorization principalto thatperform performsa theset actions.of This{@link couldAction beactions} aon
user, group or a* rolean {@link EntityId}.
   */
   * void@param revoke(EntityId entity, Principal Principal);

    /**
 the {@link EntityId} whose {@link Action actions} are to be revoked
   * Revokes@param allprincipal principals'the authorization{@link toPrincipal} performthat anyperforms action on an entitythe actions. This could be a user, *group or role
   * @param entityactions the entityset onof which{@link anAction actionactions} isto beingrevoke
performed      */
    void revoke(EntityId entity);, }

...

Principal

...

Code Block
themeConfluence
languagejava
titleSubject
firstline1
linenumberstrue
public class Principal {
	enum PrincipalType {
		USER,
		GROUP,
		ROLE
	}
 
	private final String name;
	private final PrincipalType type;
 
	public Principal(String name, PrincipalType type) {
		this.name = name;
		this.type = type;
	}
 
	public String getName() {
		return name;
	}
 
	public PrincipalType getType() {
		return type;
	}
}

Integration with Apache Sentry will be achieved by implementations of these interfaces that delegate to Apache Sentry.

Integration with Apache Sentry

Integration with Apache Sentry involves the development of three main modules:

CDAP Sentry Binding

Here we will bind CDAP to SentryGenericServiceClient and to the operations on the client.

Code Block
languagejava
titleSentryAuthorizer
public class SentryAuthorizer implements Authorizer {

    void grant(EntityId entity, Principal Principal, Set<Action> actions){
		// do grant operation on sentry client with needed mapping/conversion
	}
	... 
	...
	private SentryGenericServiceClient getClient() throws Exception {
	  return SentryGenericServiceClientFactory.create(conf); // create sentry client from Configuration 
	}
}

CDAP Sentry Model

The CDAP Sentry Model defines the CDAP entities for whom access needs to be authorized via Apache Sentry. It will based off of the Sentry Generic Authorization Model. The CDAP Sentry Model will have the following components:

CDAPAuthorizable

This interface defines the CDAP entities that need to be authorized. It must implement Authorizable.

Code Block
themeConfluence
languagejava
titleCDAPAuthorizable
firstline1
/**
 * This interface represents an authorizable resource in the CDAP component.
 */
public interface CDAPAuthorizable extends Authorizable {

  public enum AuthorizableType {
	Instance,
    Namespace,
    Artifact,
    Application,
    Program,
    Dataset,
    Stream,
  };
  AuthorizableType getAuthzType();
}

...

 principal, Set<Action> actions) throws Exception;

  /**
   * Revokes all {@link Principal principals'} authorization to perform any {@link Action} on the given
   * {@link EntityId}.
   *
   * @param entity the {@link EntityId} on which all {@link Action actions} are to be revoked
   */
  void revoke(EntityId entity) throws Exception;

  /**
   * Returns all the {@link Privilege} for the specified {@link Principal}.
   *
   * @param principal the {@link Principal} for which to return privileges
   * @return a {@link Set} of {@link Privilege} for the specified principal
   */
  Set<Privilege> listPrivileges(Principal principal) throws Exception;

  /********************************* Role Management: APIs for Role Based Access Control ******************************/
  /**
   * Create a role.
   *
   * @param role the {@link Role} to create
   * @throws RoleAlreadyExistsException if the the role to be created already exists
   */
  void createRole(Role role) throws Exception;

  /**
   * Drop a role.
   *
   * @param role the {@link Role} to drop
   * @throws RoleNotFoundException if the role to be dropped is not found
   */
  void dropRole(Role role) throws Exception;

  /**
   * Add a role to the specified {@link Principal}.
   *
   * @param role the {@link Role} to add to the specified group
   * @param principal the {@link Principal} to add the role to
   * @throws RoleNotFoundException if the role to be added to the principals is not found
   */
  void addRoleToPrincipal(Role role, Principal principal) throws Exception;

  /**
   * Delete a role from the specified {@link Principal}.
   *
   * @param role the {@link Role} to remove from the specified group
   * @param principal the {@link Principal} to remove the role from
   * @throws RoleNotFoundException if the role to be removed to the principals is not found
   */
  void removeRoleFromPrincipal(Role role, Principal principal) throws Exception;

  /**
   * Returns a set of all {@link Role roles} for the specified {@link Principal}.
   *
   * @param principal the {@link Principal} to look up roles for
   * @return Set of {@link Role} for the specified {@link Principal}
   */
  Set<Role> listRoles(Principal principal) throws Exception;

  /**
   * Returns all available {@link Role}. Only a super user can perform this operation.
   *
   * @return a set of all available {@link Role} in the system.
   */
  Set<Role> listAllRoles() throws Exception;

  /**
   * Destroys an {@link Authorizer}. Authorization extensions can use this method to write any cleanup code.
   */
  void destroy() throws Exception;
}

Where Principal is the entity performing actions defined as below:

Code Block
themeConfluence
languagejava
titleSubject
firstline1
linenumberstrue
public class Principal {
	enum PrincipalType {
		USER,
		GROUP,
		ROLE
	}
 
	private final String name;
	private final PrincipalType type;
 
	public Principal(String name, PrincipalType type) {
		this.name = name;
		this.type = type;
	}
 
	public String getName() {
		return name;
	}
 
	public PrincipalType getType() {
		return type;
	}
}

Integration with Apache Sentry will be achieved by implementations of these interfaces that delegate to Apache Sentry.

Integration with Apache Sentry

Integration with Apache Sentry involves the development of three main modules:

CDAP Sentry Binding

Here we will bind CDAP to SentryGenericServiceClient and to the operations on the client.

Code Block
languagejava
titleSentryAuthorizer
public class SentryAuthorizer implements Authorizer {

    void grant(EntityId entity, Principal Principal, Set<Action> actions){
		// do grant operation on sentry client with needed mapping/conversion
	}
	... 
	...
	private SentryGenericServiceClient getClient() throws Exception {
	  return SentryGenericServiceClientFactory.create(conf); // create sentry client from Configuration 
	}
}

CDAP Sentry Model

The CDAP Sentry Model defines the CDAP entities for whom access needs to be authorized via Apache Sentry. It will based off of the Sentry Generic Authorization Model. The CDAP Sentry Model will have the following components:

CDAPAuthorizable

This interface defines the CDAP entities that need to be authorized. It must implement Authorizable.

Code Block
themeConfluence
languagejava
titleCDAPAuthorizable
firstline1
/**
 * This interface represents an authorizable resource in the CDAP component.
 */
public interface CDAPAuthorizable extends Authorizable {

  public enum AuthorizableType {
	Instance,
    Namespace,
    Artifact,
    Application,
    Program,
    Dataset,
    Stream,
  };
  AuthorizableType getAuthzType();
}

The CDAPAuthorizable interface will have to be implemented for each authorizable entity defined by the AuthorizableType enum above.

...

Defines the Sentry Policy Engine for the cdap component. Must implement org.apache.sentry.policy.common.PolicyEngine
EntitySentry Resource URI
Instance
cdap:///instance=server1
Namespacecdap:///instance=server1/namespace=ns1
Artifactcdap:///instance=server1/namespace=ns1/artifact=art1art/artifactVersion=1
Application

cdap:///instance=server1/namespace=ns1/application=app1

Programcdap:///instance=server1/namespace=ns1/application=app1/programType=pt1/programName=prg1Dataset=pt1/programName=prg1
Datasetcdap:///instance=server1/namespace=ns1/dataset=ds1
Stream
PropertyDescriptionValue
sentry.service.allow.connectList of users allowed to connect to the Sentry Servercdap will be added to this list
sentry.cdap.provider
Authorization provider for the CDAP component in Sentry. This class defines the user-group mapping amongst other things.
org.apache.sentry.provider.common.
HadoopGroupResourceAuthorizationProvider
sentry.cdap.provider.resourceThe resource for creating the Sentry Provider Backend. This property seems unused, and always defaults to "". However, all data engines (hive, sqoop, kafka define it).""
sentry.cdap.provider.backendA class that implements ProviderBackend. This class uses a SentryServiceClient to communicate with the sentry service from the client side in Sentry.
org.apache.sentry.provider.db.generic.SentryGenericProviderBackend
sentry.cdap.policy.enginecdap:///instance=server1/namespace=ns1/dataset=ds1
Streamcdap:///instance=server1/namespace=ns1/stream=s1
Note

The above URIs are internal Apache Sentry representations defined at SentryAuthorizationModelDesign. They are only mentioned here to convey how the CDAP entity hierarchy will be represented in Apache Sentry.

Interaction Diagram

Use-case: App Deployment by an unauthorized user

Image Removed

Configuration

Sentry

stream=s1
Note

The above URIs are internal Apache Sentry representations defined at SentryAuthorizationModelDesign. They are only mentioned here to convey how the CDAP entity hierarchy will be represented in Apache Sentry.

Interaction Diagram

Use-case: App Deployment by an unauthorized user

Image Added

Configuration

Sentry

PropertyDescriptionValue
sentry.service.allow.connectList of users allowed to connect to the Sentry Servercdap will be added to this list
sentry.cdap.provider
Authorization provider for the CDAP component in Sentry. This class defines the user-group mapping amongst other things.
org.apache.sentry.provider.common.
HadoopGroupResourceAuthorizationProvider
sentry.cdap.provider.resourceThe resource for creating the Sentry Provider Backend. This property seems unused, and always defaults to "". However, all data engines (hive, sqoop, kafka define it).""
sentry.cdap.provider.backendA class that implements ProviderBackend. This class uses a SentryServiceClient to communicate with the sentry service from the client side in Sentry.
org.apache.sentry.provider.db.generic.SentryGenericProviderBackend
sentry.cdap.policy.engineDefines the Sentry Policy Engine for the cdap component. Must implement org.apache.sentry.policy.common.PolicyEngine

co.cask.cdap.security.authorization.sentry.policy.PolicyEngine

(package name subject to change)

CDAP

These properties will be defined in cdap-security.xml

PropertyDescriptionDefault
security.authorization.enabled
Determines whether authorization should be enabled in CDAP. If false, a NoOpAuthorizer would be used for security.authorizer.classfalse
security.authorizer.class
Fully qualified class name of the authorizer class. Must implement the Authorizer interfaceco.cask.cdap.security.authorization.sentry.policy.PolicyEngine

(package name subject to change)

sentry.cdap.DatasetBasedAuthorizer
instance.nameDefines the instance name for the cdap component.cdap

CDAP

These properties will be defined in cdap-security.xml

PropertyDescriptionDefault
security.authorization.enabled
Determines whether authorization should be enabled in CDAP. If false, a NoOpAuthorizer would be used for security.authorizer.classfalse
security.authorizer.class
Fully qualified class name of the authorizer class. Must implement the Authorizer interfaceco.cask.cdap.security.authorization.DatasetBasedAuthorizercdap

Role Management

To support RBAC (Role Based Access Control) such as Apache Sentry we will need to support role management through CDAP.

A user using RBAC should be able to:

  • Create a role
  • delete a role
  • add role to principal (where principal can be of type user or group)
  • remove role from a principal (where principal can be of type user or group)
  • List roles
  • List roles for principal
  • List privileges for role

We will need to support this operation from through REST  APIs and also through cli. Below is the proposed APIs and CLI commands:

Authorization API

Security CLI commands

ACL management

There are multiple options for ACL Management. For dataset-based authorizer, we will have to support ACL Management via the CDAP CLI.

...

Although supporting the Sentry Shell seems straightforward once the CDAP backend for Sentry is implemented, it's a relatively new feature added in Sentry 1.7 (SENTRY-749). CDH 5.5 ships Sentry 1.5 .5 ships Sentry 1.5 and there are no timelines on support for Sentry 1.7 (Cloudera Maven Repository).and there are no timelines on support for Sentry 1.7 (Cloudera Maven Repository).

After some digging we found out that SentryShell is hardcoded to use work with Hive and it works only with Hive. At the moment of this writing, Kafka is added support for SentryShell by making a copy for Hive's SentryShell. This seems to be the norm in Sentry for Shell support since there is no generic Shell which can be used by the services being integrated to Sentry. Unless we have some strong reason we should avoid having support for CDAP through SentryShell, specially since we are already working on supporting ACL management for CDAP in Sentry through Hue. See below. 

For recognizing and listing CDAP entities in Hue, we will have to implement a CDAP Webapp for Hue. Hue is implemented entirely in Python using the Django framework. This integration is a risk for 3.4. More details on this TBD.

Hue Integration

Testing

For testing the sentry integration, there are a couple of approaches. We can use the file-based policy store in Apache Sentry for tests. However, to simulate more realistic scenarios, we should explore if it is easy to setup an in-memory database (HSQL, etc) with the Sentry schema in tests.

...