Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Table of Contents
 

...

I am leaning towards option #1, because it keeps the configuration of principal and keytab location separate/independent than other user preferences (which are available as runtime arguments in programs).
Pending: I will add more details to how the user will interact with app-, program-, and schedule-level configuration later.

Resolution of principal
When a program is launched, the principal to be used will be determined based upon configuration at the following levels. Whichever level it is found at first will be used:

  1. Schedule
  2. Program
  3. Application
  4. Namespace

For example, if a schedule has an associated principal, and the application also has an associated principal, the schedule-level setting will be used.
If there is no schedule-level, program-level, or app-level setting, but there is a namespace-level setting, then the namespace-level setting will be used. 

 

Implementation Design

User-launched programs

Hadoop's UserGroupInformation class has the following method:

// Log a user in from a keytab file.
UserGroupInformation loginUserFromKeytabAndReturnUGI(String user, String path);

...

When a flow program is launched for the first time, CDAP Master will create an HBase table in the user's namespace to track pending events of queues (which events a particular flowlet has processed, and which are unprocessed). During execution of the flow's flowlets, the flowlets will read and update this table. Because of this, the hbase table should be created by the user that launches the flow, or at least readable and writable by that user.
Design of the necessary implementation for this has not been flushed out either, and will come later.

 

Explore Queries (TBD)

Explore queries are initiated by the CDAP user and operate on user data, even though they are launched from a system container. Because of that, impersonation will also need to be implemented for explore queries.

Design of the necessary implementation for this has not been flushed out either, and will come later.

Pending Questions

  1. How will admins configure multiple keytabs (for the various configured principals).
  2. Should we restrict updates to particular fields of the NamespaceConfig? Making it a 'final' configuration may simplify edge cases of the implementation, and will also reduce runtime failures. For instance, if user changes the principal of a namespace, the user would have to ensure that this new principal has all the appropriate permissions.
  3. When launching jobs through twill, staging directory is always cdap/twill/...; Do we need to change twill to pass in staging dir through prepareRun?

  4. If a user is logged into cdap as 'ali', shouldn't we run the YARN app as user 'ali', instead of the mapping configured on the namespace/app/etc.?

...