Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Task marked complete

 

Table of Contents
 

...

  •  User stories documented (Bhooshan)
  •  User stories reviewed (Nitin)
  •  Design documented (Bhooshan)
  •  Design reviewed (Andreas/Terence)
  •  Feature merged (Bhooshan)
  •  Documentation (Bhooshan)
  •  Blog post 

...

The typical pattern in Sentry is to whitelist a set of users who the Sentry service can accept requests from. The property that dictates this is called sentry.service.allow.connect. The description for this property states: "List of users allowed to connect to the Sentry Server. These are usually service users such as hive and impala, and the list does not usually need to include end users." . As a result, the pattern in 3.4 was to whitelist the cdap user, which was fine, because all authorization requests to Sentry originate from the CDAP Master. However, the difference in 3.5 is that now, CDAP will make requests to Sentry for authorization enforcement from program containers. To add to that, programs will run as the user that starts the program, and this user is configured at the namespace level in 3.5. So,

  1. A user creates a namespace myspace, and assigns the principal 'myuser' to it
  2. The user deploys an app in 'myspace', and starts a program
  3. The program is spawned as 'myuser'
  4. During the program execution, requests need to be made to Sentry.

For 4. above, there are two options:

  1. Send the request as the 'cdap' user. This communication has been tested to work, and will always work, as long as the 'cdap' user is whitelisted using the property mentioned earlier in the Sentry Service. To achieve this however, we will need to create an extra hop in this request. So from the program container, an RPC request is made to another container (that also executes other operations like recording lineage, usage registry and run records and workflow tokens. This other container will have the cdap user's delegation token, and will make the request to Sentry.
  2. Send the request as the user running the program. This will not need the extra hop in 1. However, the disadvantages of this are:
    1. Every single user who will ever run a CDAP program will have to be whitelisted in the Sentry Service. An alternate approach, where a certain 'cdapprogramrunners' group is whitelisted, and all users who will run a program are part of that group does not work. Even the whitelist property description suggests the same, and an experiment proved it as well.
    2. Once a user is whitelisted, it is whitelisted for all operations in the Sentry Service. This property merely decides whether a request will be accepted or rejected solely based on the defined users. It makes no distinction based on the operation being performed. There are other parameters that influence that (viz: admin groups; the fact that only admin groups can list all roles, create a role, etc; granting/revoking privileges is also determined by a policy in CDAP, which ensures that only a user that has ADMIN rights on an entity can grant/revoke - the whitelist does not influence any of these operations).

Taking into consideration all the above, it seems like for communication with Sentry, the first approach of using an extra RPC call, but communicating as 'cdap' makes sense. Unless of course users are fine with going against the Sentry norm as well as the property description of whitelisting every single user (for 3.5, this number is effectively equal to the number of namespaces in CDAP).

Dependencies

Ability to distinguish between read and write operations in datasets

...