Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Minimal refresh timeĀ 
    1. The refresh operation should be fast. The time taken for the operation should certainly be less than the refresh interval.
    2. It should make minimal RPC calls. If there is a way to load the entire snapshot of ACLs in a single RPC call, that should be preferred.
    3. It should transfer only necessary data.
  2. Configurable refresh interval
    1. The refresh operation should happen at configurable time intervals so users can tune it per their requirement.

Approach 1:

To satisfy these goals, the data structure that should be cached can be defined as follows:

...

Since the sub-components of CDAP will now just use the authorization policy cache to check for ACLs, there would be a problem if the cache refresh continually keeps failing (let's say perhaps because the authorization backend is down). If such failures are continual and consistent over a period of time, it could result in the cache being stale over a long time. This could lead to serious security loopholes, and hence there should be a way to invalidate the cache when such consistent failures occur. This could be done by having a configurable retry limit for failures. When this limit is reached, the cache would be cleared, and until the next successful refresh, any operation in CDAP will result in an authorization failure. Although this would render CDAP in an unusable state, it will reduce the chances of such a security breach. In such a case, admins will have to fix the communication between CDAP and the authorization backend before CDAP can be used again.

Alternative Caching Approach (Approach 2)

An alternative caching approach would be for the CDAP sub-components to query the cache for a privilege, and the cache to return if there is a hit, and go back to the authorization provider if there is a miss.

...

  1. The major drawback of this approach seems like it could make the majority access pattern potentially slow, because it requires a call to the authorization provider every time an privilege (a combination of a principal, an entity and an action) is not found in the cache. Since a majority of these combinations are unlikely to be in the cache at a given point in time, this approach is likely to cause a lot of cache misses. It is likely that in the normal flow, an operation is slow because it has to make a call to the authorization provider, whereas in the earlier approach, the slowness only happens when the cache is being updated.

Hybrid Approach (Approach 3)

Since both the approaches above have definite drawbacks, we could use a hybrid approach. In this approach, the cache would be keyed by a principal. When there is a cache miss for a principal, the requested ACL for the principal will be fetched from the authorization provider and the cache would be updated. Along with this, a background thread will update the cache with all the ACLs for the requested principal, so any further requests for this principal can be fulfilled by the cache. Each entry in the cache will have a configurable expiry, thereby ensuring freshness, without needing a long refresh time. This approach still does not avoid security loopholes, since a privilege could be updated before the cache is refreshed, but it seems like a good median. Guaranteeing security would need a more sophisticated mechanism of the authorization provider publishing a message whenever an ACL is updated in a queue that the cache listens to, but that could be future work.

...