Table of Contents |
---|
Introduction
Implement configuring impersonation at the Application level.
Goals
- Application Impersonation: As a part of, CDAP-6131 - Ability for CDAP to run programs as a particular user. ( In Progress) we implemented impersonation for programs and data operations, but this could only be configured at the namespace level. We need the ability to configure this at the application level, so that we can run programs as different users, without having to manage additional namespaces for each app.
- Entity ownership: CDAP-8065 - Entity Ownership: Entities in CDAP should have an owner and a group ( Open) Entities created by applications should be owned by the application owner.
- Explore Impersonation (Stretch Goal): As a part of, CDAP-6587 - Impersonate users when performing hive operations ( Resolved) we implemented impersonation in Hive for Explore queries to impersonate the namespace user if one was provided. For better security measures we will like to run explore queries as the user who submits them.
Scenarios
Scenario 1: App Creation
- Louis should get all the privileges (READ/WRITE/EXECUTE) on all the entities created by the deployed application, with CDAP authorization.
- Louis should own all streams/dataset created by deployed app i.e. he will own the HDFS files and HBase tables.
- All programs should run with Louis' credentials (e.g. Kerberos ticket) i.e if another user Bob, who has sufficient privileges to run a program (EXECUTE on program and READ on namespace, if CDAP Authorization is turned on), starts the program then the program should run as Louis.
- Additionally, during app creation, Alice can also specify a group name. When the app is deployed CDAP will change the group of the HDFS files and/or Hbase/Hive tables so that the specified group users have read access.
Scenario 2: Dataset Creation/Maintenance
- Alice is a human user. Alice will like to create a dataset without deploying an application and during creation, she wants to specify an owner who will own the dataset i.e. the HDFS files/HBase tables. She specifies a headless user Louis, whose account she has access to, as the owner.
- Alice will like to perform dataset maintenance operations (truncate, delete, update) from REST APIs, CLI, or UI and she will like for these operations to be performed as the dataset owner Louis.
- Another user Bob who has sufficient privileges to administer the dataset can perform the maintenance operations, all operations will be performed as the dataset owner Louis.
- Additionally, during dataset creation, Alice can also specify a group name. When the dataset is created CDAP will change the group of the HDFS files and/or Hbase/Hive tables so that the specified group users have read access.
Scenario 3: Access Control
- Jules is a human user who does not have CDAP credentials and wants to run a Hive query outside of CDAP. Her access to the data can be controlled by group permissions.
- Mary is a headless user who owns a CDAP program that reads from a dataset owned by Louis. An admin adds Mary to the group for the dataset. The program owned by Mary can now read the dataset.
- (Stretch) Eve is a human user who has both LDAP and kerberos credentials. She logs into the CDAP UI with her LDAP credentials and submits a query. While submitting the query she provides her kerberos principal and passwordtgt. The query should be run as her kerberos principal.
Design
- Impersonation is done using keytabs. All keytabs are accessible by the cdap user on all master nodes
- Users to be impersonated must be set up outside of CDAP
For user principal to keytab management we will use the following conventions:
- All keytabs are present on the local filesystem on which CDAP master Master is running.
- These keytabs are present under a path which can be in one of the following formats and cdap has read access on all the keytabs. :
- /dir1>/<dir2>/${user.name}.keytab
- /dir1>/<dir2>/${user.name}/${user.name}.keytab
The above path is provided to cdap CDAP as a configuration parameter in cdap-security.xml
Code Block <property> <name>security.keytab.dir</name> <value>/dir1>/<dir2>/${user.name}.keytab </value> </property>
- User principal to keytab mapping is managed separately
- Configuring an app for impersonation requires admin on the CDAP instance, with CDAP authorization
- Without CDAP authorization any user will be able to impersonate any other user
- After an app is deployed, any user with sufficient privileges on the program can start/stop programs, see its status, see metrics, see logs, etc. All such actions will be impersonated as the owner of the app regardless of the user doing it.
- Explore will not be impersonated the same way, . Explore queries will be run as the kerberos Kerberos principal provided by the user submitting the query.
- The user submitting the query will specify a kerberos tgtKerberos tgt. The tgt can be obtained by doing a kinit and providing user's Kerberos credentials. By default, it is located in /tmp with the name "krb5cc_<uid of the user>". The location can be controlled by setting KRB5CCNAME.
- Audit log will show which logged-in user impersonated whom to run a query.
For detailed design, please see Secure Impersonation - Security 4.1
API changes
New REST APIs
Entity Ownership:
Path | Method | Description | Response Code | Response | ||
---|---|---|---|---|---|---|
/v3/namespaces/<namespace>/apps/<app-id> | GET | Gives the application details which will contain owner principal as a field | 200 - On success 404 - when When the specified app does not exist 500 - Any internal errors |
| ||
/v3/namespaces/<namespace-id>/streams/<stream-id> | GET | Gives the stream properties which will contain owner principal as a field | 200 - On success 404 - when When the specified stream does not exist 500 - Any internal errors |
| ||
/v3/namespaces/<namespace-id>/datasets/<dataset-name>/properties | GET | Gives the dataset properties which will contain owner principal as a field | 200 - On success 404 - when When the specified dataset does not exist 500 - Any internal errors |
|
Entity Creation:
Create APIs for Stream/Datasets and Applications will take two additional JSON properties.
Owner name as string specified as:
{
"owner.principal"
:
"user-principal"
"allowed.group": "groupname"
}
CLI Impact or Changes
- (optional) Create CLI for the above REST APIs
UI Impact or Changes
Provide a way for the user to specify kerberos credentials while launching an Explore queryJira Legacy server Cask Community Issue Tracker serverId 45b48dee-c8d6-34f0-9990-e6367dc2fe4b key CDAP-8078 - (optional) Create UI for the above REST APIs
Security Impact
Authorization will need to be implemented on the new REST APIs (which manage the impersonation metadata and the users and their credentials ). Authorization will also need to be added when programmatically accessing this metadata (such as when launching the programs or performing dataset operations involving impersonation).
Impact on Infrastructure Outages
This will rely on HBase for storing metadata (Similar to how we store all sorts of other metadata for applications). Without HBase (and dataset service), this will not work.
Releases
Release 4.1.0
Future work
- Support ACLs for the HDFS files and HBase tables that are created when new CDAP entities are created.
- Push down ACLs to the storage providers.
- Support changing entity ownership