Adding a cache to security extension

 

Introduction 

Starting with CDAP release 4.2, CDAP master will no longer cache authorization privileges. The responsibility of caching is now with the extension. 

Goals

Cache authorization privileges to reduce the latency of authorization calls. 

Design

Since the extension runs as part of the master process that is where the extensions cache will reside. Due to this, the amount of memory consumed by the cache is an important consideration.

There a couple of reasons for keeping the cache in the master.

  1. Some authorization providers need to whitelist a set of users who the service will accept requests from. These providers encourages the use of service users and not end users. So, only the user running the cdap master process needs to be in the whitelist.
  2. When an application is started, multiple containers could try to fetch the privileges for the same principal. Fetching the data from the authorization provider for each of these calls would be costly and greatly increase the program startup time.

 

An example of a security extension can be found in the sentry security extension. The key for the cache is composed of the Principal, the Entity, and the Action and the value is a boolean stating whether that Principal has the privilege to perform the Action on the Entity. e.g.,

 

AuthorizationPrivilege{principal=Principal{name='alice', type=USER, kerberosPrincipal=null}, entityId=namespace:alicens, action=READ} = true
AuthorizationPrivilege{principal=Principal{name='alice', type=USER, kerberosPrincipal=null}, entityId=namespace:alicens, action=ADMIN} = false

Here we are caching both the allowed and the disallowed privileges.

 

A guava loading cache can be used for this purpose. The loading cache needs a method that is called in case of a cache miss. This method contacts the Authorization provider and returns the results which then gets loaded into the cache.

The cache can be initialized as follows 

Initialize the cache
LoadingCache<AuthorizationPrivilege, Boolean> authPolicyCache = CacheBuilder.newBuilder()
  .expireAfterWrite(AUTHORIZATION_CACHE_TIMEOUT, TimeUnit.SECONDS)
  .maximumSize(AUTHORIZATION_CACHE_MAX_ENTRIES)
  .build(new CacheLoader<AuthorizationPrivilege, Boolean>() {
    @Override
    public Boolean load(AuthorizationPrivilege authorizationPrivilege) throws Exception {
      return enforce(authorizationPrivilege);
    }
  });

 

In this example, we are creating a Loading cache that keeps a mapping from AuthorizationPrivilege, which is composed of Principal, Entity, and Action, to a boolean specifying whether the principal has that privilege or not.

We set the 'expireAfterWrite' property to a configurable timeout. We prefer expireAfterWrite over expireAfterAccess because in the second case if the cache entry keeps getting hit, it will never expire and any changes made to in on the provider will not be reflected. Using expireAfterWrite forces a reload and hence picks up any changes made. The value used here needs to be large enough so that we don't keep hitting the provider too frequently but this value also represents the delay between making a change in the backend and that change taking effect.

Next we set the maximum size for the cache. We need to set this value because an unbound cache could grow very large and cause an out of memory issue for the master. Loading cache lets us set the maximum number of entries that the cache will hold. The size of each entry will depend on the principal and entity names and the maximum number of entities should be calculated accordingly.

Then we come to the load method. This is the method that will be called when there is a cache miss. It takes a single argument of the type of the cache key and is supposed to return the resulting value. In our example caching scheme this method will take an authorizationPrivilege and return a boolean.