Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 62 Next »

 

 

Goals

  1. Authorize a subset of operations on CDAP entities using Apache Sentry

  2. Make the authorization system pluggable. Support the following two systems to begin with:

    1. Sentry based

    2. CDAP Dataset based

Checklist

  • User stories documented (Rohit/Bhooshan)
  • User stories reviewed (Nitin)
  • Design documented (Rohit/Bhooshan)
  • Design reviewed (Andreas)
  • Feature merged (Rohit/Bhooshan)
  • Examples and guides (Rohit)
  • Integration tests (Bhooshan) 
  • Documentation for feature (Rohit/Bhooshan)
  • Blog post 

User Stories

  • As a CDAP system, I should be able to integrate with Apache Sentry for fine-grained role-based access controls of select CDAP operations 
  • As a CDAP admin, I should be able to create/update/delete roles in Apache Sentry
  • As a CDAP admin, I should be able to add users/groups to roles in Apache Sentry
  • As a CDAP admin, I should be able to turn authorization on/off easily for entire CDAP instance
  • As a CDAP system, I should be able to authorize the following requests
    • Namespace create/update/delete
    • Application deployment
    • Program start/stop
    • Stream read/write
      These operations are a subset that represents the various 'kinds' of operations allowed in CDAP

Scenarios

Scenario #1

  • D-Rock is an IT-Admin extra-ordinaire who has just been tasked with adding authorizing access to entities in CDAP on the cluster he manages. 
  • D-Rock is already familiar with Apache Sentry, since he has used it for authorization in other projects like Apache HDFS, Apache Hive, Apache Sqoop, etc. 
  • He would rather not learn a new authorization system. He would instead prefer that Apache Sentry be used to provide Role Based Access Control to CDAP entities as well.

Scenario #2

  • D-Rock manages a variety of CDAP clusters in dev/smoke/qa/staging environments along with the prod environment.
  • For these environments, he would like to be able to turn authorization on/off easily with a switch for the CDAP instance, depending on the need at a given time.

Scenario #3

  • Ideally, D-Rock would like to be able to authorize all operations on all entities in CDAP. 
  • However, this can be rolled out in phases. In the initial phase, he would like to control who can:
    • Create/update/delete a namespace
      • Only users with WRITE permission on CDAP instance should be able to perform this operation.
      • A property in cdap-site.xml should decide a set of users who have admin permission on cdap instance. These admins can then later grant permissions to other users.
    • Deploy an application in a namespace
      • Only users with WRITE permission on the namespace should be able to perform this operation
      • One the application is deployed the the user who deployed becomes the ADMIN of the application. 
    • Start/stop a program
      • Only users with READ permission on the namespace and application, and EXECUTE permission on the program should be able to perform this operation
      • Only users with ADMIN permission on the program can set preference for the program
      • Only users with WRITE permission can provide runtime args
    • Read/write to a stream
      • Only users with READ privilege on the namespace and READ permission on the stream should be able to read from the stream
      • Only users with READ privilege on the namespace and WRITE permission on the stream should be able to write to the stream

Entities, Operations and Required Privileges

EntityOperationRequired PrivilegesResultant Privileges
NamespacecreateADMIN (Instance)ADMIN (Namespace)
 updateADMIN (Namespace) 
 listREAD (Instance) 
 getREAD (Namespace) 
 deleteADMIN (Namespace) 
 set preferenceWRITE (Namespace) 
 get preferenceREAD (Namespace) 
 searchREAD (Namespace) 
ArtifactaddWRITE (Namespace)ADMIN (Artifact)
 deleteADMIN (Artifact) 
 getREAD (Artifact) 
 listREAD (Namespace) 
 write propertyADMIN (Artifact) 
 delete propertyADMIN (Artifact) 
 get propertyREAD (Artifact) 
 refreshWRITE (Instance) 
 write metadataADMIN (Artifact) 
 read metadataREAD (Artifact) 
ApplicationdeployWRITE (Namespace)ADMIN (Application)
 getREAD (Application) 
 listREAD (Namespace) 
 updateADMIN (Application) 
 deleteADMIN (Application) 
 set preferenceWRITE (Application) 
 get preferenceREAD (Application) 
 add metadataADMIN (Application) 
 get metadataREAD (Application) 
Programsstart/stop/debugEXECUTE (Program) 
 set instancesADMIN (Program) 
 listREAD (Namespace) 
 set runtime argsEXECUTE (Program) 
 get runtime argsREAD (Program) 
 get instancesREAD (Program) 
 set preferenceADMIN (Program) 
 get preferenceREAD (Program) 
 get statusREAD (Program) 
 get historyREAD (Program) 
 add metadataADMIN (Program) 
 get metadataREAD (Program) 
 emit logsWRITE (question) (Program) 
 view logsREAD (Program) 
 emit metricsWRITE (question) (Program) 
 view metricsREAD (Program) 
StreamscreateWRITE (Namespace)ADMIN (Stream)
 update propertiesADMIN (Stream) 
 deleteADMIN (Stream) 
 truncateADMIN (Stream) 
 enqueue
asyncEnqueue
batch
WRITE (Stream) 
 getREAD (Stream) 
 listREAD (Namespace) 
 read eventsREAD (Stream) 
 set preferencesADMIN (Stream) 
 get preferencesREAD (Stream) 
 add metadataADMIN (Stream) 
 get metadataREAD (Stream) 
 view lineageREAD (Stream) 
 emit metricsWRITE (question) (Stream) 
 view metricsREAD (Stream) 
DatasetslistREAD (Namespace) 
 getREAD (Dataset) 
 createWRITE (Namespace)ADMIN (Dataset)
 updateADMIN (Dataset) 
 dropADMIN (Dataset) 
 executeAdmin (exists/truncate/upgrade)ADMIN (Dataset) 
 add metadataADMIN (Dataset) 
 get metadataREAD (Dataset) 
 view lineageREAD (Dataset) 
 emit metricsWRITE (question) (Dataset) 
 view metricsREAD (Dataset) 
Stream ViewcreateWRITE (Namespace) & ADMIN (Stream)ADMIN (Stream View)
 deleteADMIN (Stream View) 
 listREAD (Namespace) & READ (Stream) 
 getREAD (Stream View) 
 add metadataADMIN (Stream View) 
 get metadataREAD (Stream View) 

NOTE: Cells marked green are in scope for 3.4

Design

This feature can be broken down into the following main parts, in no specific order:

Authorization Hooks in CDAP

This would include the authorization system in CDAP. External systems like Apache Sentry/Ranger could be plugged into this system. It provides authorization hooks during various operations within CDAP, that throw AuthorizationException if the operation is not authorized.

This system exposes a set of interfaces defined below. 

AuthChecker

The AuthChecker interface provides a way to check if an operation is authorized. At various points in the CDAP code (NamespaceHttpHandler, AppLifecycleHttpHandler, ProgramLifecycleHttpHandler, StreamHandler in 3.4), this interface will be used to check if an operation is authorized.

AuthChecker Interface
interface AuthChecker {
	/**
     * Checks if a user is allowed to perform a set of actions on an entity.
     *
     * @param Principal the Principal that performs the actions. This could be a user, group or a role
     * @param entity the entity on which an action is being performed
     * @param action the action being performed
     * @throws AuthorizationException if the Principal is not authorized to perform action on the entity
     */
	void checkAuthorized(Principal Principal, Entity entity, Action action) throws AuthorizationException;
}

Authorizer

This interface allows CDAP admins to grant/revoke permissions for specific operations on specific CDAP entities to specified Principals. It will be used by the ACL Management module, which may or may not reside in CDAP for the purposes of integration with Apache Sentry (question) TBD.

Authorizer Interface
interface Authorizer extends AuthChecker {
	/**
     * Grants a principal authorization to perform a set of actions on an entity.
     *
     * @param entity the entity on which an action is being performed
     * @param principal the Principal that performs the actions. This could be a user, group or a role
     * @param actions the set of actions to grant
     */
    void grant(EntityId entity, Principal principal, Set<Action> actions);

	/**
     * Grants a Principal authorization to perform all actions on an entity.
     *
     * @param entity the entity on which an action is being performed
     * @param principal the Principal that performs the actions. This could be a user, group or a role
     */
    void grant(EntityId entity, Principal principal, Set<Action> actions);
	/**
     * Revokes a principal's authorization to perform a set of actions on an entity.
     *
     * @param entity the entity on which an action is being performed
     * @param principal the principal that performs the actions. This could be a user, group or a role
     * @param actions the set of actions to revoke permissions on
     */
    void revoke(EntityId entity, Principal principal, Set<Action> actions);

	/**
     * Revokes a principal's authorization to perform any action on an entity.
     *
     * @param entity the entity on which an action is being performed
     * @param principal the principal that performs the actions. This could be a user, group or a role
     */
    void revoke(EntityId entity, Principal Principal);

    /**
     * Revokes all principals' authorization to perform any action on an entity.
     *
     * @param entity the entity on which an action is being performed
     */
    void revoke(EntityId entity);
}

Where Principal is the entity performing actions defined as below:

Subject
public class Principal {
	enum PrincipalType {
		USER,
		GROUP,
		ROLE
	}
 
	private final String name;
	private final PrincipalType type;
 
	public Principal(String name, PrincipalType type) {
		this.name = name;
		this.type = type;
	}
 
	public String getName() {
		return name;
	}
 
	public PrincipalType getType() {
		return type;
	}
}

Integration with Apache Sentry will be achieved by implementations of these interfaces that delegate to Apache Sentry.

 

 

Integration with Apache Sentry

Integration with Apache Sentry involves the development of three main modules:

CDAP Sentry Binding

Here we will bind CDAP to SentryGenericServiceClient and to the operations on the client.

SentryAuthorizer
public class SentryAuthorizer implements Authorizer {

    void grant(EntityId entity, Principal Principal, Set<Action> actions){
		// do grant operation on sentry client with needed mapping/conversion
	}

    void grant(EntityId entity, Principal Principal, Set<Action> actions){
		// do grant operation on sentry client with needed mapping/conversion
	}

    void revoke(EntityId entity, Principal Principal, Set<Action> actions){
		// do grant operation on sentry client with needed mapping/conversion
	}

    void revoke(EntityId entity, Principal Principal){
		// do revoke operation on sentry client with needed mapping/conversion
	}

    void revoke(EntityId entity){
		// do revoke operation on sentry client with needed mapping/conversion
	}
 	void checkAuthorized(Principal Principal, Entity entity, Action action) throws AuthorizationException{
		// do authorization check operation on sentry client with needed mapping/conversion
	}
 
	private SentryGenericServiceClient getClient() throws Exception {
	  return SentryGenericServiceClientFactory.create(conf); // create sentry client from Configuration 
	}
}

 

CDAP Sentry Model

The CDAP Sentry Model defines the CDAP entities for whom access needs to be authorized via Apache Sentry. It will based off of the Sentry Generic Authorization Model. The CDAP Sentry Model will have the following components:

CDAPAuthorizable

This interface defines the CDAP entities that need to be authorized. It must implement Authorizable.

CDAPAuthorizable
/**
 * This interface represents an authorizable resource in the CDAP component.
 */
public interface CDAPAuthorizable extends Authorizable {

  public enum AuthorizableType {
	Instance,
    Namespace,
    Artifact,
    Application,
    Program,
    Dataset,
    Stream,
	Stream_View
  };
  AuthorizableType getAuthzType();
}

The CDAPAuthorizable interface will have to be implemented for each authorizable entity defined by the AuthorizableType enum above.

CDAPAction and CDAPActionFactory

These classes will implement BitFieldAction and BitFieldActionFactory to define the types of actions on CDAP entities. These classes also allow you to define implies relationships between actions.

TODO: Think about ALL, ADMIN_ALL

CDAPActions
public class CDAPActionConstants {
  public static final String READ = "read";
  public static final String EXECUTE = "execute";
  public static final String WRITE = "write";
  public static final String ADMIN = "admin"; // this is read + write + execute + admin (create/update/delete)
}

Sentry Policy Engine

Resource URIs

Using the above authorizable model, resource URIs for CDAP entities in the Sentry Policy Engine will be as follows:

 

Note: Will have to whitelist the cdap user for the Sentry Service.

ACL management

  • TBD: either using CDAP CLI or via external systems like Sentry CLI or Hue (question)

Questions

  1. How does CDAP get sentry-site.xml? Path provided via cConf?
  2. Distinguishing Read/Write access is perhaps out of scope of 3.4, since we will need changes to Dataset Framework
  3. Can access to all entities be authorized in one go? If so, how? 
  4. How does hierarchy work? e.g. write to stream requires READ perms on namespace + write perms on stream
  5. In a secure/kerberos environment, what does it take to communicate with the Sentry Server?
  6. In a secure/kerberos environment, what does it take to communicate with the Sentry Server?
  7. Given that Sentry has a slightly data-engine-based schema, will we need some updates to the policy store to contain CDAP specific tables for storing CDAP Privileges? SENTRY_CDAP_PRIVILEGE and SENTRY_CDAP_PRIVILEGE_MAP tables?
  8. What about instance-level authorization? Would users need to be authorized to a given CDAP instance as well, along with the namespace and entity?
  9. Do we need EXECUTE operation just for Programs entity. Can we say that any user who has READ can run the program ? 

Discussion Bhooshan & Rohit 02/17

 

CDAP SpecificExternal Auth Service: SentryACL Management
  1. Provide Authorization Hooks in CDAP
    1. Intercept all HTTP calls
    2. Thrift calls
    3. Access to data from programs
  1. Modules to implement
    1. Binding
    2. Model
    3. Policy
    4. E2E Tests
  1. Should CDAP do ACL Management
    1. CLI
    2. HTTP Handlers

    3. If we assume ACLs are set in Sentry through Sentry
      what if we switch to Dataset based store.

2. Authorization Checks

Check
for a given user/group and type of access
	if allowed:
		perform operation
	else:
		throw AuthException

2. Figuring out how to interact with Sentry

    • SentryGenericServiceClient
    • How to know where Sentry is running?

 

 

 
3. We need an Authorization interface  

Discussion with Gokul 02/08

  • Push down ACLs  - No HBase support in Sentry
  • Custom datasets - how do you recognize read/writes
  • How do you distinguish between read/write
  • Sentry Integration - needs follow-ups
  • Performance (num RPC calls)
  • Sentry Persistent Storage - PolicyStoreProvider
  • Interactions with Auth system
  • Sentry web-app for UI may need customizations in Hue
  • How does switching between authorization enabled/disabled work

Out-of-scope User Stories (3.5 and beyond)

  1. As a CDAP admin, I should be able to authorize reads/writes to datasets
  2. As a CDAP admin, I should be able to authorize metadata changes to CDAP entities
  3. As a CDAP system, I should be able to push down ACLs to storage providers
  4. As a CDAP admin, I should be able to authorize reads/writes to custom datasets
  5. As a CDAP system, I should be able to judge, document and improve the performance impact of authorization
  6. As a CDAP authorization system, I should be able to interact with an external authentication system
  7. As a CDAP admin, I should be able to use external UIs like Hue for ACL Management
  8. As a CDAP admin, I should be able to see an audit log of all authorization-related changes in CDAP

References

  • No labels