...
Like mentioned above, the policy cache in CDAP can be made consistent with authorization providers at regular scheduled intervals. However, this has the following race: Suppose Alice and Bob have been given READ access to Dataset1, and this state is consistent in both the external system (e.g. Apache Sentry) and the cache. Now, ACLs are updated to remove Alice's permissions. Until the time when the refresh thread mentioned above runs, the cache will be inconsistent with the external system, and CDAP will still think that both Alice and Bob have READ access to Dataset1. The severity of this may vary depending on the situation, but it is a security loophole nonetheless. There are two possible ways in which this situation may arise:
- User uses CDAP (CLI/REST APIs) to update ACLs: In this scenario, we can have a callback to the
revoke
APIs in CDAP to also update the cache. As long as both updating the store and the cache is done transactionally , there would not be an inconsistency between the external system and the CDAP cache. - User uses an external interface (e.g. Hue, Apache Ranger UI) to update ACLs: In this scenario, we may have to depend upon the external system providing a callback mechanism. Even if such a mechanism is provided, the interface for the cache to be updated (e.g. from a message queue), will have to be built in CDAP. The external system can then add events to such an interface, and the cache could keep itself up-to-date. In the first release, however, it is likely that there may be an inconsistency if this method is chosen to update ACLs.
...