Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Minimal refresh time 
    1. The refresh operation should be fast.  The time taken for the operation should certainly be less than the refresh interval.
    2. It should make minimal RPC calls. If there is a way to load the entire snapshot of ACLs in a single RPC call, that should be preferred.
    3. It should transfer only necessary data.
  2. Configurable refresh interval
    1. The refresh operation should happen at configurable time intervals so users can tune it per their requirement.

...

(question): authorization providers may have their own caching mechanisms. e.g. Sentry has PrivilegeCache. Is any integration with these possible?

Cache Freshness

Like mentioned above, the policy cache in CDAP can be made consistent with authorization providers at regular scheduled intervals. However, this has the following race: Suppose Alice and Bob have been given READ access to Dataset1, and this state is consistent in both the external system (e.g. Apache Sentry) and the cache. Now, ACLs are updated to remove Alice's permissions. Until the time when the refresh thread mentioned above runs, the cache will be inconsistent with the external system, and CDAP will still think that both Alice and Bob have READ access to Dataset1. The severity of this may vary depending on the situation, but it is a security loophole nonetheless. There are two possible ways in which this situation may arise:

  1. User uses CDAP (CLI/REST APIs) to update ACLs: In this scenario, we can have a callback to the revoke APIs in CDAP to also update the cache. As long as both updating the store and the cache is done transactionally (question), there would not be an inconsistency between the external system and the CDAP cache.
  2. User uses an external interface (e.g. Hue, Apache Ranger UI) to update ACLs: In this scenario, we may have to depend upon the external system providing a callback mechanism. Even if such a mechanism is provided, the interface for the cache to be updated (e.g. from a message queue), will have to be built in CDAP. The external system can then add events to such an interface, and the cache could keep itself up-to-date by consuming from this interface. In the first release, however, it is likely that there may be an inconsistency if this method is chosen to update ACLs.

...

Apache Sentry has some active work going on to enable client-side caching as part of SENTRY-1229. It will likely suffer from the same drawbacks mentioned above regarding cache freshness, but CDAP should re-use such caching mechanisms if they're available in authorization providers. Another drawback of this approach though is that it is active work. There is a case for re-using this (and other such) caching from authorization providers in CDAP. However, we will choose to implement a cache in CDAP independently because of the following reasons:

  1. We would like a cache in CDAP that works independently of authorization providers. For example, we would like the same caching mechanism to be available irrespective of the configured authorization backend (Apache Sentry, the Dataset-backed Authorization backend or Apache Ranger in future).
  2. This is active work in progress in Apache Sentry, and there are no timelines yet as to when this change will make it to a CDH distro (currently marked for Apache Sentry 1.8.0).

Authorizing list operations

...