Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. The major drawback of this approach seems like it could make the majority access pattern potentially slow, because it requires a call to the authorization provider every time an privilege (a combination of a principal, an entity and an action) is not found in the cache. Since a majority of these combinations are unlikely to be in the cache at a given point in time, this approach is likely to cause a lot of cache misses. It is likely that in the normal flow, an operation is slow because it has to make a call to the authorization provider, whereas in the earlier approach, the slowness only happens when the cache is being updatedthe cache is being updated.

Hybrid Approach

Since both the approaches above have definite drawbacks, we could use a hybrid approach. In this approach, the cache would be keyed by a principal. When there is a cache miss for a principal, the requested ACL for the principal will be fetched from the authorization provider and the cache would be updated. Along with this, a background thread will update the cache with all the ACLs for the requested principal, so any further requests for this principal can be fulfilled by the cache. Each entry in the cache will have a configurable expiry, thereby ensuring freshness. This approach still does not ensure 100% absense of security loopholes, since a privilege could be updated before the cache is refreshed, but it seems like a good median. Guaranteeing security would need a more sophisticated mechanism of the authorization provider publishing a message in a queue that the cache listens to, but that could be future work.

Caching in Apache Sentry

Apache Sentry has some active work going on to enable client-side caching as part of SENTRY-1229. It will likely suffer from the same drawbacks mentioned above regarding cache freshness. There is a case for re-using this (and other such) caching from authorization providers in CDAP. However, we will choose to implement a cache in CDAP independently because of the following reasons:

  1. We would like a cache in CDAP that works independently of authorization providers. For example, we would like the same caching mechanism to be available irrespective of the configured authorization backend (Apache Sentry, the Dataset-backed Authorization backend or Apache Ranger in future).
  2. This is active work in progress in Apache Sentry, and there are no timelines yet as to when this change will make it to a CDH distro (currently marked for Apache Sentry 1.8.0).

Turning caching off

For certain usecases where caching of security policies may not be acceptable even at the cost of a significant performance hit, a configuration knob should be provided to turn caching off. By default though, caching will be enabled.

AuthorizingĀ listĀ operations

...