Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Task marked complete

Table of Contents
 

...

  •  User stories documented (Ali)
  •  User stories reviewed (Nitin)
  •  Design documented (Ali)
  •  Design reviewed (Andreas/Terence)
  •  Feature merged (Ali)
  •  Blog post 

...

Hadoop's UserGroupInformation class has the following method:

// Log a user in from a keytab file.
UserGroupInformation loginUserFromKeytabAndReturnUGI(String user, String path);

...

Brief summary of overall changes

...

  1. During program runtime, cdap master will impersonate a user and launch the YARN app. This will make it so that cdap programs run as various users.
    1. Because these users will not have access to system tables, they will go through CDAP system services for writing to system tables (run records, lineage, usage, workflow token).
  2. During namespace operations (create/delete), dataset service will perform the namespace create and delete operations (HBase namespace, HDFS directories, explore database), while impersonating the configured user.
  3. During dataset admin operations (create/delete/truncate), dataset op executor service will perform the operations while impersonating the configured user.
  4. (to be finalized) Stream admin operations as well as stream writing operations will have to happen while impersonating the configured user.
  5. (to be finalized) Explore queries launched will have to happen while impersonating the configured user.
  6. (to be finalized) Artifact deployment will also need to impersonate the user, when deploying artifact in user scope.

Note: any time that a system service wishes to impersonate a user, it will involve looking up the configured principal/keytab, then localizing the keytab from distributed file system, and creating a UGI based upon this keytab. A caching mechanism for these UGI's would be useful.

 

Problems Encountered

User applications writing to CDAP System tables

One of the aspects of impersonation that we did not consider is that YARN applications corresponding to a CDAP program will no longer have permissions as the 'cdap' system user. For instance, if the program is configured to be launched as user 'joe', it is not guaranteed that 'joe' has access to the 'cdap_system' hbase namespace or to system tables. However, the yarn application still (currently) writes to system tables.
Here are examples of when a user program writes to system tables:

...