Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Usecase: App Deployment by an unauthorized user

Configuration

Sentry

Note: Will have to whitelist the cdap user for the Sentry Service.

CDAP

ACL management

Installation

...

TBD: either using CDAP CLI or via external systems like Sentry CLI or Hue (question)

...

PropertyDescriptionValue
sentry.service.allow.connectList of users allowed to connect to the Sentry Servercdap will be added to this list
sentry.cdap.provider
Authorization provider for the CDAP component in Sentry. This class defines the user-group mapping amongst other things.
org.apache.sentry.provider.common.
HadoopGroupResourceAuthorizationProvider
sentry.cdap.provider.resourceThe resource for creating the Sentry Provider Backend. This property seems unused, and always defaults to "". However, all data engines (hive, sqoop, kafka define it).""
sentry.cdap.provider.backendA class that implements ProviderBackend. This class uses a SentryServiceClient to communicate with the sentry service from the client side in Sentry.
org.apache.sentry.provider.db.generic.SentryGenericProviderBackend
sentry.cdap.policy.engineDefines the Sentry Policy Engine for the cdap component. Must implement org.apache.sentry.policy.common.PolicyEngine

co.cask.cdap.security.authorization.sentry.policy.PolicyEngine

(package name subject to change)

sentry.cdap.instance.nameDefines the instance name for the cdap component.cdap

CDAP

These properties will be defined in cdap-security.xml

PropertyDescriptionDefault
security.authorization.enabled
Determines whether authorization should be enabled in CDAP. If false, a NoOpAuthorizer would be used for security.authorizer.classfalse
security.authorizer.class
Fully qualified class name of the authorizer class. Must implement the Authorizer interfaceco.cask.cdap.security.authorization.DatasetBasedAuthorizer

ACL management

There are multiple options for ACL Management. For dataset-based authorizer, we will have to support ACL Management via the CDAP CLI.

For Apache Sentry based authorizer, there are multiple options. We should support this via the CDAP CLI because it should involve very little extra work. However, support should also be provided via the SentryShell as well as Hue.

Although supporting the Sentry Shell seems straightforward once the CDAP backend for Sentry is implemented, it's a relatively new feature added in Sentry 1.7 (SENTRY-749). CDH 5.5 ships Sentry 1.5 and there are no timelines on support for Sentry 1.7 (Cloudera Maven Repository).

For recognizing and listing CDAP entities in Hue, we will have to implement a CDAP Webapp for Hue. Hue is implemented entirely in Python using the Django framework. This integration is a risk for 3.4. More details on this TBD.

Testing

For testing the sentry integration, there are a couple of approaches. We can use the file-based policy store in Apache Sentry for tests. However, to simulate more realistic scenarios, we should explore if it is easy to setup an in-memory database (HSQL, etc) with the Sentry schema in tests.

Installation

Questions

  1. How does CDAP get sentry-site.xml? Path provided via cConf?
  2. Distinguishing Read/Write access is perhaps out of scope of 3.4, since we will need changes to Dataset Framework
  3. Can access to all entities be authorized in one go? If so, how? 
  4. How does hierarchy work? e.g. write to stream requires READ perms on namespace + write perms on stream
  5. In a secure/kerberos environment, what does it take to communicate with the Sentry Server?
  6. In a secure/kerberos environment, what does it take to communicate with the Sentry Server?
  7. Given that Sentry has a slightly data-engine-based schema, will we need some updates to the policy store to contain CDAP specific tables for storing CDAP Privileges? SENTRY_CDAP_PRIVILEGE and SENTRY_CDAP_PRIVILEGE_MAP tables?
  8. What about instance-level authorization? Would users need to be authorized to a given CDAP instance as well, along with the namespace and entity?
  9. Do we need EXECUTE operation just for Programs entity. Can we say that any user who has READ can run the program ? 

...