Table of Contents |
---|
Checklist
- User Stories Documented
- User Stories Reviewed
- Design Reviewed
- APIs reviewed
- Release priorities assigned
- Test cases reviewed
- Blog post
Introduction
Implement configuring impersonation at the application level. Enable impersonation in Explore queries.
Goals
Application Impersonation: As a part of,
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Explore Impersonation: As a part of,
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
Entity ownership:
Jira Legacy | ||||||
---|---|---|---|---|---|---|
|
User Stories
- (Similar to Secure Impersonation user stores, but with application-level impersonation)
- As a CDAP admin, I would like to map an application (and the entities it contains) to a Kerberos principal. When CDAP programs of this application are submitted to YARN, the applications should be run as that user.
- As a CDAP application developer, my application should access HDFS, HBase, Hive, and other resources as the user/principal configured for it, instead of the global 'cdap' (or other, configured) user.
- As CDAP admin, I would like explore queries to run as the user submitting the explore query.
- As CDAP application/dataset/stream owner I will like give access to other users on application/dataset/stream during creation or afterwards.
Scenarios
Scenario 1: App Creation
- Louis should get all the privileges (READ/WRITE/EXECUTE) on all the entities created by the deployed application, with CDAP authorization.
- Louis should own all streams/dataset created by deployed app i.e. he will own the HDFS files, HBase tables, and Hive tables.
- All programs should run with Louis' credentials (e.g. Kerberos ticket) i.e if another user Bob, who has sufficient privileges to run a program (EXECUTE on program and READ on namespace, if CDAP Authorization is turned on), starts the program then the program should run as Louis.
Scenario 2: Dataset Creation/Maintenance
- Alice is a human user. Alice will like to create a dataset without deploying an application and during creation she wants to specify an owner who will own the dataset i.e. the HDFS files/HBase tables/Hive tables. She specifies the principal for a headless user Louis, whose account she has access to, as the owner.
- Alice will like to perform dataset maintenance operations (truncate, delete, update) from REST APIs, CLI, or UI and she will like for these operations to be performed as the dataset owner Louis.
- Another user Bob who has sufficient privileges to administer the dataset can perform the maintenance operations, all operations will be performed as the dataset owner Louis.
Scenario 3: Access Control
- Jules is a human user who does not have CDAP credentials and wants to run a Hive query outside of CDAP. Her access to the data can be controlled by group permissions.
- Mary is a headless user who owns a CDAP program that reads from a dataset owned by Louis. An admin adds Mary to the group for the dataset. The program owned by Mary can now read the dataset.
- Eve is a human user who has both LDAP and kerberos credentials. She logs into the CDAP UI with her LDAP credentials and submits a query. While submitting the query she provides her kerberos principal and password. The query should be run as her kerberos principal.
Design
Currently, whenever we need to perform a data operation or launch a program in YARN, we lookup the namespace that this entity exists in, and based upon the principal mapping for that namespace, we impersonate for that principal. If there is no mapping, we perform actions as the current user (cdap system user). Now, we will need to maintain a mapping from entities such as applications, streams, and datasets.
Entity ownership
The ownership information for entities will be stored in a "owner.meta" table. The table will store the Entity to the owners kerberos principal (as a string) mapping. This information along with the permissions on the entity will be pushed down to the storage provider and that will be used to control access (future work).
This will introduce an additional step during entity creation. An entry will need to be made to the owner.meta table.
The table will not be used to store ACLs for this release as that will be handled by the storage provider but in future releases, we can expand this to manage the ACLs. This feature will be useful for storage providers that don't support ACLs. It will also be useful in providing a layer of abstraction over authorization backends like Apache Sentry and Apache Ranger.
Note: If an entity exists with an associated owner and the same entity is being created by some other user then this operation will fail. Also, if this entity creation was triggered by some other operation then the complete operation will fail too. For example, Alice has deployed an app in CDAP which created a dataset called 'employees'. Now if Bob tries to deploy another app which creates the same dataset called 'employee' then the app deployment will fail. If Bob wants to read the employee dataset from his app then he should be get the 'employee' dataset in his program dynamically. Now he should be able to read this dataset if Scenario 3.2 conditions are meet.
Rows in owner.meta will be of the format
The row key will be constructed from the entity id and will capture the Entity hierarchy. e.g. for a stream it will be constructed using the namespace and stream id.
rowkey: {<created from entity id>}, column {'c'}, and the owner's principal as the value
User management
To allow headless users access to the system, other authorized users need to impersonate them. To allow this impersonation we set the following convention:
- All keytabs are present on the local filesystem on which CDAP master is running.
- These keytabs are present under path which needs to be specified in cdap-security.xml:
- /dir1>/<dir2>/${name}/${name}.keytab
- ${name} will be replaced with the short name of the owner's principal. They can be used anywhere in the path. e.g. /home/${name}/kerberos/keytabs/${name}.keytab
Code Block |
---|
<property> <name>keytab.path</name> <value>/dir1/dir2/${name}/${name}.keytab</value> </property> |
Pushing permissions to storage engines after creation (Out of 4.1 Scope)
The permissions assigned for entities will need to be pushed down to storage providers so that access outside the system will have the same restrictions. Both HBase and HDFS support ACLs and they will be used to assign finer grained permissions to the underlying tables or files.
Directory permissions
The directory structure will be as follows, CDAP will own the parent directories for the namespace. The directories will be group writable and everyone who has app deployment privileges will be part of that group so that they can create subdirectories. For any cleanup, for example, when the namespace is being deleted, the system user will impersonate the subdirectory owners to do the deletion. With this impersonation in place, the system user will not need access permissions on user directories.
The groups for the directories will be specified while the entry is being created and once the directory is created the system will do a chgrp to change it to the provided group.
e.g.
drwxrwxr-x - cdap supergroup 0 2017-01-16 04:39 /cdap/namespaces/
To be able to create a namespace the user will need to be a part of the "supergroup".
A group can also be specified in cdap-security.xml with property "namespace.creators". If a group is specified for this property then CDAP will change the group of /cdap/namespaces to the specified group allowing users in the existing group to create namespace.
The namespace directory will be owned by the namespace owner
During the creation of namespace a group can be specified and this group will have write and execute permission on the namespace directory allowing the users of this group to deploy application in the namespace. Note: This will require change in our existing namespace creation API.
drwxrwxr-x - accountadmin accountgroup 0 2017-01-16 04:39 /cdap/namespaces/account
To be able to create anything under that namespace the user will have to be a part of the "accountgroup"
Stream:
drwxr-xr-x - account1 accountgroup 0 2017-01-17 02:41 /cdap/namespaces/account/streams/st1
All the directories will be owned by the headless users whose keytabs need to be present so that they can be impersonated. Additionally during the creation of app, stream and dataset the user can specify a group and CDAP will change the group of the the associated files on hdfs and tables on hbase and hive so that the given group have read access.
Explore Impersonation
For explore impersonation we won't be using keytabs. A human user will login using their credentials and to run explore queries they will have to provide a kerberos username and a password. The system will authenticate with KDC on behalf of the user and use the tgt to create a UGI for the user through the static method
static UserGroupInformation | getUGIFromTicketCache(java.lang.String ticketCache, java.lang.String user) |
This UGI will then be used to impersonate the queries.
The RemoteUGIProvider provides methods that are called when a UGI is needed to impersonate a user. During the call to RemoteUGIProvider#createUGI the Kerberos TGT can be obtained from the master through a rest API (/impersonation/credentials)
class ImpersonationInfo currently contains a principal and their keytab. This will change to include the path to the ticket cache for the user.
Workflows
UI:
The explore window shows up when the user clicks on the explore icon on any explorable entity. If kerberos is enabled in the cluster then a modal window will show up the first time the explore icon is clicked. Through this window, the user can provide the Kerberos principal that the explore query should run as and the TGT for that principal.
The UI forwards the principal and the TGT to the router which forwards it to CDAP master. Both these routes support SSL. Once master has the TGT it can be serialized to HDFS with permissions set to 600.
Explore container can then use the TGT on HDFS to create a UserGroupInformation object and use that to impersonate the principal for running the query. The UGI once created will be cached.
CLI:
The user would need to do a kinit before they would be able to launch an Explore query from the CLI. The CLI would then pick up the TGT and rest of the flow is the same as UI.
REST:
For running Explore queries through the REST APIs the user will need to provide the TGT and the principal along with the query.
Upgrade tool
None
Open Questions
- Currently, hive impersonation does not work when the engine is set to spark.
Do we need to fix this in 4.1?Jira Legacy server Cask Community Issue Tracker serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-7700
Notes
- The principal configured for an application MUST have privileges to create tables in the (HBase) namespace it is deployed in. What happens if cdap is the entity creating this HBase namespace? How will the custom principal have CREATE privileges in that namespace?
- We will use AuthorizationHandler and PrivilegesManager for managing ACLs on the entities during and after creation.
- The specification for impersonation is at Secure Impersonation Specification
API changes
New Programmatic APIs
New internal APIs:
Impersonation Store: Stores the user keytab information
Code Block | ||
---|---|---|
| ||
public class ImpersonationStore { public void addImpersonationInfo(final ImpersonationInfo impersonationInfo) throws IOException { } public ImpersonationInfo getImpersonationInfo(final String principal) throws IOException, ImpersonationInfoNotFound { } // idempotent public void delete(final String principal) throws IOException { } |
Permission Store: Stores the entity ownership information.
Code Block | ||
---|---|---|
| ||
public class PermissionStore { public void addOwner(final EntityId entityId, final String principal) throws IOException { } public ImpersonationInfo getOwner(final EntityId entityId) throws IOException, NotFoundException { } // idempotent public void deleteOwner(final EntityId entityId) throws IOException { } } |
Code Block | ||
---|---|---|
| ||
public final class ImpersonationInfo { private final String principal; private final String keytabURI; } |
Potential new external APIs (TBD):
Allowing group and permissions for FileSets/Streams/(other?)
New REST APIs
Entity Ownership:
Path | Method | Description | Response Code | Response |
---|---|---|---|---|
/v3/namespaces/<namespace>/apps/<app-id>/owner | GET | Gives the configured owner of the application | 200 - On success 404 - when the specified app does not exist 500 - Any internal errors | String:owner |
/v3/namespaces/<namespace>/stream/<stream-id>/owner | GET | Gives the configured owner of the stream | 200 - On success 404 - when the specified stream does not exist 500 - Any internal errors | String:owner |
/v3/namespaces/<namespace>/datasets/<dataset-id>/owner | GET | Gives the configured owner of the dataset | 200 - On success 404 - when the specified dataset does not exist 500 - Any internal errors | String:owner |
Remote Owner Service
We need a Remote implementation of OwnerAdmin so that the program container or cdap service container which performs request under impersonation (which can be either namespace/app/dataset/stream owner) can look Please see Secure Impersonation Specification#EntityOwnership
Remote Owner Service
We need a Remote implementation of OwnerAdmin so that the program container or cdap service container which performs request under impersonation (which can be either namespace/app/dataset/stream owner) can look up owner information internally if needed.
For example, a explore query on a stream is handled by ExploreQueryExecutorHttpHandler. The handlers here does impersonation as the namespace owner. Now when the query actually runs its might need to look up other cdap resources (for example say the stream configuration). This call in itself does impersonation by doing a doAs for the resource involved (in this case the stream). The Impersonator which is responsible for providing the UGI to be impersonated for this call tried tries to look up owner information for the resource it fails and will fail since it tries to access owner.meta table which is a system table and cannot be accessed under user impersonation.
This requires adding a Remote implementation of OwnerAdmin which program container and cdap service container can use to get the owner information. We will also need to add a handler in cdap-app-fabric which will serve the requests from the remote client. Since this handler will reside inside cdap master it can query owner store through owner admin as since it will be running as cdap user.
We will expose the following endpoints:
200 - On success
409 - if owner information for entity already exists
500 - Any internal errors
200 - On success
409 - if owner information for entity already exists
500 - Any internal errors
following endpoints: (Note: Currently, we only support owner for namespace, app, artifact, stream, dataset)
Path | Method | Request Body | Response Code | Response | |||||
---|---|---|---|---|---|---|---|---|---|
Adding Owner | |||||||||
/v1/owner/ | POST |
| 200 - On success 409 - if owner information for entity already exists 500 - Any internal errors |
| |||||
Deleting Owner | |||||||||
/v1/owner/namespaces/{namespace-id}/artifacts/{artifact-name}/version/{artifact-version} | POST | ||||||||
||POST|String: principal| 200: Success or 409: Owner already exists|
|/v1/owner/namespaces/{namespace-id}/streams/{stream-id}|POST|String: principal| 200: Success or 409: Owner already exists|
|/v1/owner/namespaces/{namespace-id}/datasets/{dataset-id}|POST|String: principal| 200: Success or 409: Owner already exists|
Entity Creation:
Create APIs for Stream/Datasets and Applications will take two additional JSON properties.
Owner name as string specified as:
language | java |
---|
DELETE |
| 200 - On success 500 - Any internal errors | ||||||||||||
Getting Owner | ||||||||||||||
/v1/owner/ | GET |
| 200 - On success 500 - Any internal errors |
| ||||||||||
Getting Impersonation Information | ||||||||||||||
/v1/owner/impinfo | GET |
| 200 - On success 500 - Any internal errors |
|
Entity Creation:
Please see: Secure Impersonation Specification#EntityCreation
CLI Impact or Changes
- CDAP-8079 - Provide a way to specify kerberos credentials for launching Explore queries through CLI in impersonated environment ( Open) Provide a way for the user to specify kerberos credentials while launching an Explore query
- (optional) Create CLI for the above REST APIs
UI Impact or Changes
- CDAP-8078 - Provide a way to specify kerberos credentials for launching Explore queries through UI in impersonated environment ( Open) Provide a way for the user to specify kerberos credentials while launching an Explore query
- (optional) Create UI for the above REST APIs
Security Impact
We will need to implement authorization on the above REST APIs (which manage the impersonation metadata). Authorization will also need to be added when programmatically accessing this metadata (such as when launching the programs or performing dataset operations involving impersonation).
Impact on Infrastructure Outages
This will rely on HBase for storing metadata (Similar to how we store all sorts of other metadata for applications). Without HBase (and dataset service), this will definitely not work.
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
IMP100 | (default namespace) Deploy an application from an artifact, for principal X, and run a program. | The program should run as X. Datasets/streams should havetheirhdfs/hbaseownedby X. |
IMP101 | (default namespace) Deploy another application from the same artifact, without specifying principal, and run a program. | The program should run as the cdap system user. Datasets/streams should havetheirhdfs/hbaseownedby cdap system user |
IMP102 | RUN IMP100 and IMP102 in a custom namespace, that doesn't have impersonation | Expectation should be the same. |
IMP103 | Run IMP100 and IMP102 in a namespace that already has impersonation configured. | < Expected behavior TBD > |
IMP104 | ||
IMP105 | ||
IMP106 |
Releases
Release 4.1.0
Related Work
- Work #1
- Work #2
- Work #3
Future work
...