...
Usecase: App Deployment by an unauthorized user
Configuration
Sentry
Note: Will have to whitelist the cdap user for the Sentry Service.
CDAP
ACL management
Installation
...
TBD: either using CDAP CLI or via external systems like Sentry CLI or Hue
...
Property | Description | Value |
---|---|---|
sentry.service.allow.connect | List of users allowed to connect to the Sentry Server | cdap will be added to this list |
sentry.cdap.provider | Authorization provider for the CDAP component in Sentry. This class defines the user-group mapping amongst other things. | org.apache.sentry.provider.common. HadoopGroupResourceAuthorizationProvider |
sentry.cdap.provider.resource | The resource for creating the Sentry Provider Backend. This property seems unused, and always defaults to "". However, all data engines (hive, sqoop, kafka define it). | "" |
sentry.cdap.provider.backend | A class that implements ProviderBackend . This class uses a SentryServiceClient to communicate with the sentry service from the client side in Sentry. | org.apache.sentry.provider.db.generic.SentryGenericProviderBackend |
sentry.cdap.policy.engine | Defines the Sentry Policy Engine for the cdap component. Must implement org.apache.sentry.policy.common.PolicyEngine |
(package name subject to change) |
sentry.cdap.instance.name | Defines the instance name for the cdap component. | cdap |
CDAP
These properties will be defined in cdap-security.xml
Property | Description | Default |
---|---|---|
security.authorization.enabled | Determines whether authorization should be enabled in CDAP. If false, a NoOpAuthorizer would be used for security.authorizer.class | false |
security.authorizer.class | Fully qualified class name of the authorizer class. Must implement the Authorizer interface | co.cask.cdap.security.authorization.DatasetBasedAuthorizer |
ACL management
There are multiple options for ACL Management. For dataset-based authorizer, we will have to support ACL Management via the CDAP CLI.
For Apache Sentry based authorizer, there are multiple options. We should support this via the CDAP CLI because it should involve very little extra work. However, support should also be provided via the SentryShell as well as Hue.
Although supporting the Sentry Shell seems straightforward once the CDAP backend for Sentry is implemented, it's a relatively new feature added in Sentry 1.7 (SENTRY-749). CDH 5.5 ships Sentry 1.5 and there are no timelines on support for Sentry 1.7 (Cloudera Maven Repository).
For recognizing and listing CDAP entities in Hue, we will have to implement a CDAP Webapp for Hue. Hue is implemented entirely in Python using the Django framework. This integration is a risk for 3.4. More details on this TBD.
Testing
For testing the sentry integration, there are a couple of approaches. We can use the file-based policy store in Apache Sentry for tests. However, to simulate more realistic scenarios, we should explore if it is easy to setup an in-memory database (HSQL, etc) with the Sentry schema in tests.
Installation
Questions
- How does CDAP get
sentry-site.xml
? Path provided viacConf
? - Distinguishing Read/Write access is perhaps out of scope of 3.4, since we will need changes to Dataset Framework
- Can access to all entities be authorized in one go? If so, how?
- How does hierarchy work? e.g. write to stream requires READ perms on namespace + write perms on stream
- In a secure/kerberos environment, what does it take to communicate with the Sentry Server?
- In a secure/kerberos environment, what does it take to communicate with the Sentry Server?
- Given that Sentry has a slightly data-engine-based schema, will we need some updates to the policy store to contain CDAP specific tables for storing CDAP Privileges?
SENTRY_CDAP_PRIVILEGE
andSENTRY_CDAP_PRIVILEGE_MAP
tables? - What about instance-level authorization? Would users need to be authorized to a given CDAP instance as well, along with the namespace and entity?
- Do we need EXECUTE operation just for Programs entity. Can we say that any user who has READ can run the program ?
...