Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

The above cache would be re-populated asynchronously from the configured Authorization Provider (Apache Sentry/Apache Ranger, etc) at a configurable time interval, using an AbstractScheduledService. Instead of querying these authorization providers every time an authorization check is required, various CDAP sub-components will instead query this cache.

(question): authorization providers may have their own caching mechanisms. e.g. Sentry has PrivilegeCache. Is any integration with these possible?

Cache Freshness

Like mentioned above, the policy cache in CDAP can be made consistent with authorization providers at regular scheduled intervals. However, this has the following race: Suppose Alice and Bob have been given READ access to Dataset1, and this state is consistent in both the external system (e.g. Apache Sentry) and the cache. Now, ACLs are updated to remove Alice's permissions. Until the time when the refresh thread mentioned above runs, the cache will be inconsistent with the external system, and CDAP will still think that both Alice and Bob have READ access to Dataset1. The severity of this may vary depending on the situation, but it is a security loophole nonetheless. There are two possible ways in which this situation may arise:

  1. User uses CDAP (CLI/REST APIs) to update ACLs: In this scenario, we can have a callback to the revoke APIs in CDAP to also update the cache. As long as both updating the store and the cache is done transactionally (question), there would not be an inconsistency between the external system and the CDAP cache.
  2. User uses an external interface (e.g. Hue, Apache Ranger UI) to update ACLs: In this scenario, we may have to depend upon the external system providing a callback mechanism. Even if such a mechanism is provided, the interface for the cache to be updated (e.g. from a message queue), will have to be built in CDAP. The external system can then add events to such an interface, and the cache could keep itself up-to-date by consuming from this interface. In the first release, however, it is likely that there may be an inconsistency if this method is chosen to update ACLs.

Handling cache refresh failures

Since the sub-components of CDAP will now just use the authorization policy cache to check for ACLs, there would be a problem if the cache refresh continually keeps failing (let's say perhaps because the authorization backend is down). If such failures are continual and consistent over a period of time, it could result in the cache being stale over a long time. This could lead to serious security loopholes, and hence there should be a way to invalidate the cache when such consistent failures occur. This could be done by having a configurable retry limit for failures. When this limit is reached, the cache would be cleared, and until the next successful refresh, any operation in CDAP will result in an authorization failure. Although this would render CDAP in an unusable state, it will reduce the chances of such a security breach. In such a case, admins will have to fix the communication between CDAP and the authorization backend before CDAP can be used again.

Caching in Apache Sentry

Apache Sentry has some active work going on to enable client-side caching as part of SENTRY-1229. It will likely suffer from the same drawbacks mentioned above regarding cache freshness. There is a case for re-using this (and other such) caching from authorization providers in CDAP. However, we will choose to implement a cache in CDAP independently because of the following reasons:

...