Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  •  User stories documented (Bhooshan)
  •  User stories reviewed (Nitin)
  •  Design documented (Bhooshan)
  •  Design reviewed (Andreas/Terence)
  •  Feature merged (Bhooshan)
  •  Documentation (Bhooshan)
  •  Blog post 

User Stories

  1. As a CDAP security admin, I want all operations on datasets/streams to be governed by my configured authorization system.
  2. As a CDAP security admin, I want list operations for all CDAP entities to only return entities that the logged-in user is authorized to view.
  3. As a CDAP security admin, I want view operations for a CDAP entity to only succeed if the logged-in user is authorized to view that entity

...

The above cache would be re-populated asynchronously from the configured Authorization Provider (Apache Sentry/Apache Ranger, etc) at a configurable time interval, using an AbstractScheduledService. Instead of querying these external systems every time an authorization check is required, various CDAP sub-components will instead query this cache.

TODO(question): External systems may have their own caching mechanisms. e.g. Sentry has PrivilegeCache. We should make this cache pluggable, the APIs exposed by these external systems can be re-used. Is any integration with these possible

Cache Freshness

Like mentioned above, the policy cache in CDAP can be made consistent with external systems at regular scheduled intervals. However, this has the following race: Suppose Alice and Bob have been given READ access to Dataset1, and this state is consistent in both the external system (e.g. Apache Sentry) and the cache. Now, ACLs are updated to remove Alice's permissions. Until the time when the refresh thread mentioned above runs, the cache will be inconsistent with the external system, and CDAP will still think that both Alice and Bob have READ access to Dataset1. The severity of this may vary depending on the situation, but it is a security loophole nonetheless. There are two possible ways in which this situation may arise:

  1. User uses CDAP to update ACLs: In this scenario, we can have a callback to the revoke APIs in CDAP to also update the cache. As long as both updating the store and the cache is done transactionally (question), there would not be an inconsistency between the external system and the CDAP cache.
  2. User uses an external interface (e.g. Hue, Apache Ranger UI) to update ACLs: In this scenario, we may have to depend upon the external system providing a callback mechanism. Even if such a mechanism is provided, the interface for the cache to be updated (e.g. from a message queue), will have to be built in CDAP. The external system can then add events to such an interface, and the cache could keep itself up-to-date. In the first release, however, it is likely that there may be an inconsistency if this method is chosen to update ACLs.

Authorizing list operations

...