Ranger Integration Design Document
Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
IntroductionÂ
- Extend the Authorization capabilities of CDAP
- Integrate with various authorization backend for broader support and acceptance of CDAP
Goals
Integrate CDAP with Apache Ranger so that CDAP can use Ranger as Authorization storage.
User StoriesÂ
- Jack is an Hadoop admin of an enterprise which is adopting CDAP for faster application development. The enterprise uses HDP/Ambari as their Hadoop ecosystem and all the components use Ranger for authorization in harmony. They will like to be able
to use Ranger to authorize resources in CDAP. - Jack wants to use Ranger Administrative as the centralized place to manage privileges in CDAP.
- Jack wants to be able to grant privileges beforehand allowing the authorized user to create entities
- Jack wants CDAP authorization to be audited in Ranger's centralized auditing system and be able to search it through Solr.
- Jack wants to be able to grant privileges using tags in CDAP. (Note: Out of 4.3 scope)
Architecture and Design
Apache Ranger is centralized security framework used to manage authorization privileges. More detailsÂ
To integrate CDAP with Apache Ranger we will need to develop an authorization plugin. Below we summarize different components which need to be developed for this plugin. The cdap-ranger plugin will be developed according to the CDAP authorization model. The design below does not talk about CDAP authorization model and assumes that the reader is familiar with it to keep the design concise to focused around Ranger.
Â
There are three major components that need to be developed to integrate CDAP with Ranger.
Service Definition
The service definition defines the CDAP to Ranger. It's a JSON which is used to install/register CDAP as a service in Ranger. The JSON defines different entities (resources) in CDAP and their hierarchy. Below is an example of CDAP service definition used to register CDAP in Ranger.
Â
{ "id": 11, // unique id for CDAP service "name": "cdap", "implClass": "co.cask.cdap.security.authorization.ranger.lookup.RangerLookupService", // defines the Java class which will be used for resources lookup "label": "CDAP", "description": "CDAP", "resources": [ { "itemId": 1, "name": "instance", // entity name "type": "string", // the value type "level": 10, // defines placement level in the UI "parent": "", // defines the hierarchy "mandatory": true, // defines whether it's required or not for a privilege "lookupSupported": true, // defines whether this resource can be looked up "recursiveSupported": false, "excludesSupported": true, "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher", "matcherOptions": { "wildCard": true, "ignoreCase": false }, "validationRegEx": "", "validationMessage": "", "uiHint": "", "label": "CDAP Instance", "description": "CDAP Instance" }, { "itemId": 2, "name": "namespace", "type": "string", "level": 20, "parent": "instance", "mandatory": false, "lookupSupported": true, "recursiveSupported": false, "excludesSupported": false, "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher", "matcherOptions": { "wildCard": true, "ignoreCase": false }, "validationRegEx": "", "validationMessage": "", "uiHint": "", "label": "Namespace", "description": "CDAP Namespace" }, { "itemId": 3, "name": "stream", "type": "string", "level": 30, "parent": "namespace", "mandatory": false, "lookupSupported": true, "recursiveSupported": false, "excludesSupported": false, "matcher": "org.apache.ranger.plugin.resourcematcher.RangerDefaultResourceMatcher", "matcherOptions": { "wildCard": true, "ignoreCase": false }, "validationRegEx": "", "validationMessage": "", "uiHint": "", "label": "Stream", "description": "Stream" }, ... // other entities like dataset, application etc ... "accessTypes": [ // defines the actions in CDAP { "itemId": 1, "name": "read", "label": "Read" }, { "itemId": 2, "name": "write", "label": "Write" }, { "itemId": 3, "name": "execute", "label": "execute" }, { "itemId": 4, "name": "admin", "label": "Admin" } ], "configs": [ // defines the connection parameter which will be used to connect to CDAP { "itemId": 1, "name": "cdap.username", "type": "string", "subType": "", "mandatory": false, "validationRegEx": "", "validationMessage": "", "uiHint": "", "label": "Username" }, { "itemId": 2, "name": "cdap.password", "type": "password", "subType": "", "mandatory": false, "validationRegEx": "", "validationMessage": "", "uiHint": "", "label": "Password" }, { "itemId": 3, "name": "cdap.instance.url", "type": "string", "subType": "", "mandatory": true, "label": "Instance URL", "defaultValue": "", "validationRegEx": "", "validationMessage": "" } ], "enums": [ ], "contextEnrichers": [ ], "policyConditions": [ ] }
Authorization Binding
The authorization binding will be used to talk to Ranger from CDAP. The RangerAuthorizer will implement our AbstractAuthorizer class and provide an implementation which will talk to Ranger using RangerBasePlugin to perform enforcement and other operation.Â
Note: Even though we we expect users to do privilege management for  ranger using ranger's administrative console we will need to support privilege management from CDAP to maintain feature parity with our Sentry Authorization Plugin. This will require that we implement all the grant/revoke etc in our ranger binding.
Design
Specifically, we will need to do the following 1:
Initialization
Create a static/global instance of RangerBasePlugIn class (or a class derived from this). Keep a reference to this instance for later – to authorize resource access.
Call init() on this instance. This will initialize the policy-engine with authorization policies from Ranger Admin and trigger a background thread to periodically update policies from the Ranger Admin.
Register an audit handler, like RangerDefaultAuditHandler, with the plugin instance. Plugin will use this audit handler to generate audit logs of resource accesses.
Authorization
Create an instance of RangerAccessRequest implementation, like RangerAccessRequestImpl, with details of the access that needs to be authorized – resource, access-type, user, etc.
Call isAccessAllowed() on the plugin instance created earlier.
Depending upon the returned result, either allow or deny the access.
Resource Lookup
This is an optional component but really crucial if we want to give a good user experience for managing privileges for CDAP using Ranger administrative console. This will allow Ranger to talk to CDAP and list resources like stream, dataset etc so that an admin can select them in the Ranger UI and define privileges on the entity.Â
Design
Specifically, we will need to do the following 1:
Extend class RangerBaseService and provide implementation of lookupResource() and validateConfig() methods.
Provide the name of this class in service-type definition.
Copy the library (jar file) that includes the class implementation, and other libraries referenced by this class, under ranger-plugins/<service-type>directory in CLASSPATH of Ranger Admin.
- The lookup resource will then use CDAP clients for looking up different entities in CDAP like stream, dataset etc.
CLI Impact or Changes
- Ranger does not have support for roles so CDAP CLI needs to handle this case
Future Work
Tag Based Policies
As of now CDAP 4.2 authorization in CDAP is based on entities. Every entity in CDAP can also have tags associated with them. It will be useful to allow admins to do authorization based on tags rather than entities. One of the major advantages of tag based authorization is that it separates the resource classification from authorization. For example, in a CDAP instance, sensitive dataset can be tagged with Gold. An admin can then define an authorization policy for the tag Gold allowing it to be used by only some authorized user. Now when a new sensitive dataset is created, for authorization enforcement to work all that needs to be done is to tag it with Gold. This eliminates the requirement of creating new policies for the newly created dataset. This can be taken further where a dataset created from a sensitive dataset can be automatically tagged with pre-defined tags (Metadata Inheritance).
Apache Ranger supports Tag based authorization. More details
- A service which wants to use Ranger's tag based authorization needs to push it tags to Ranger. The easiest way to do it is to push it to Apache Atlas and then configure Ranger to sync up tags from Atlas.Â
- Once Ranger is aware of the tags in a service it can perform tag based policy evaluation for authorization. As mentioned in the section "6 Tags in policy evaluation" of this document.
TODO
- Figure out working with kerberos enabled ranger
- Kerberos enabled multi node cluster where CDAP and Ranger runs on different nodes.
- Configuring Ranger to work with LDAP for integration testing
References
https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=53741207
https://cwiki.apache.org/confluence/display/RANGER/Tag+Based+Policies
Â