Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The above cache would be re-populated asynchronously from the configured Authorization Provider (Apache Sentry/Apache Ranger, etc) at a configurable time interval, using an AbstractScheduledService. Instead of querying these external systems authorization providers every time an authorization check is required, various CDAP sub-components will instead query this cache.

(question): External systems authorization providers may have their own caching mechanisms. e.g. Sentry has PrivilegeCache. Is any integration with these possible

...

Like mentioned above, the policy cache in CDAP can be made consistent with external systems authorization providers at regular scheduled intervals. However, this has the following race: Suppose Alice and Bob have been given READ access to Dataset1, and this state is consistent in both the external system (e.g. Apache Sentry) and the cache. Now, ACLs are updated to remove Alice's permissions. Until the time when the refresh thread mentioned above runs, the cache will be inconsistent with the external system, and CDAP will still think that both Alice and Bob have READ access to Dataset1. The severity of this may vary depending on the situation, but it is a security loophole nonetheless. There are two possible ways in which this situation may arise:

  1. User uses CDAP to update ACLs: In this scenario, we can have a callback to the revoke APIs in CDAP to also update the cache. As long as both updating the store and the cache is done transactionally (question), there would not be an inconsistency between the external system and the CDAP cache.
  2. User uses an external interface (e.g. Hue, Apache Ranger UI) to update ACLs: In this scenario, we may have to depend upon the external system providing a callback mechanism. Even if such a mechanism is provided, the interface for the cache to be updated (e.g. from a message queue), will have to be built in CDAP. The external system can then add events to such an interface, and the cache could keep itself up-to-date. In the first release, however, it is likely that there may be an inconsistency if this method is chosen to update ACLs.

Caching in Apache Sentry

Apache Sentry has some active work going on to enable client-side caching as part of SENTRY-1229. It will likely suffer from the same drawbacks mentioned above regarding cache freshness, but CDAP should re-use such caching mechanisms if they're available in authorization providers. Another drawback of this approach though is that it is active work, and there are no timelines yet as to when this change will make it to a CDH distro.

Authorizing list operations

...