Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 7 Next »

 

 

Goals

  1. Performance improvements (caching authorization policies)
  2. Authorization of dataset and stream access

  3. Authorization for listing and viewing entities

Checklist

  • User stories documented (Bhooshan)
  • User stories reviewed (Nitin)
  • Design documented (Bhooshan)
  • Design reviewed (Andreas/Terence)
  • Feature merged (Bhooshan)
  • Blog post 

User Stories

  1. As a CDAP security admin, I want all operations on datasets/streams to be governed by my configured authorization system.
  2. As a CDAP security admin, I want list operations for all CDAP entities to only return entities that the logged-in user is authorized to view.
  3. As a CDAP security admin, I want view operations for a CDAP entity to only succeed if the logged-in user is authorized to view that entity

Scenarios

Scenario #1

Derek is an IT Operations Extraordinaire at a corporation that uses CDAP to manage datasets with varying degrees of sensitivity. He would like to implement authorization policies for all data stored in CDAP across datasets and streams, so only authorized users have access to such data. He would like to control both read as well as write access. 

Scenario #2

Derek would like to be able to use external authorization systems like Apache Sentry to manage authorization policies. Given that Apache Sentry could be installed in a different environment from CDAP, he would like to minimize the impact of verifying authorization while accessing data. Derek expects that performance improvement does not result in security breaches. For example, if authorization policies are cached in CDAP, Derek expects that they be refreshed regularly at configurable time intervals.

Scenario #3

In the said organization, CDAP is used to store data belonging to various business units. These business units are potentially completely disparate, and do not share information. Some of their data or applications may be extremely sensitive. As a security measure, Derek would also like to enforce authorization for operations that list CDAP entities, so that a user can only see the entities that he is authorized to read or write.

Design

Authorizing Dataset and Stream operations

The most critical requirement to address in 3.5 is to authorize dataset and stream operations. These operations can be categorized into data access (read/write) and admin (create, update, delete). Admin operations can be presumed to occur less often than data access operations, and are not in the data path. As a result, even though performance is important, it is less critical for admin operations compared to data access operations. For data access operations, it is not practical to communicate with an external authorization system like Apache Sentry for every single operation, since that would lead to major performance degradation. As a result, authorization policies need to be cached in CDAP potentially for all operations, but especially for data access operations.

One of the major concerns about caching is freshness or invalidation. It is especially important in a security/authorization context, because it could result in security breaches. For example, suppose we've cached all authorization policies. An update, especially a rollback of privileges in the external authorization system should result in an immediate refresh of the cache, otherwise there could be security breaches by the time refresh takes place.

For such an authorization policy cache, the major design goals are:

  1. Minimal refresh time 
    1. The refresh operation should be fast. 
    2. It should make minimal RPC calls
    3. It should transfer only necessary data
  2. Configurable refresh interval
    1. The refresh operation should happen at configurable time intervals so users can tune it per their requirement.

To satisfy these goals, the data structure that should be cached can be defined as follows:

PrivilegeCache
// TODO: Explore using Guava Cache
class PrivilegeCache {
  private final Table<Principal, EntityId, Set<Action>> privileges = HashBasedTable.create();

  public void addPrivileges(Principal principal, EntityId entityId, Set<Action> actionsToAdd) {
    Set<Action> actions = privileges.get(principal, entityId);
    if (actions == null) {
      actions = new HashSet<>();
    }
    actions.addAll(actionsToAdd);
    privileges.put(principal, entityId, actions);
  }

  public void revokePrivileges(Principal principal, EntityId entityId, Set<Action> actionsToRemove) {
    Set<Action> actions = privileges.get(principal, entityId);
    if (actions == null) {
      throw new NoSuchElementException();
    }
    actions.removeAll(actionsToRemove);
    privileges.put(principal, entityId, actions);
  }
}

The above cache would be re-populated asynchronously from the configured Authorization Provider (Apache Sentry/Apache Ranger, etc) at a configurable time interval, using an AbstractScheduledService. Instead of querying these external systems every time an authorization check is required, various CDAP sub-components will instead query this cache.

Dependencies

Ability to distinguish between read and write operations in datasets

Entities, Operations and Privileges

EntityOperationRequired PrivilegesResultant Privileges
NamespacecreateADMIN (Instance)ADMIN (Namespace)
 updateADMIN (Namespace) 
 listREAD (Instance) 
 getREAD (Namespace) 
 deleteADMIN (Namespace) 
 set preferenceWRITE (Namespace) 
 get preferenceREAD (Namespace) 
 searchREAD (Namespace) 
ArtifactaddWRITE (Namespace)ADMIN (Artifact)
 deleteADMIN (Artifact) 
 getREAD (Artifact) 
 listREAD (Namespace) 
 write propertyADMIN (Artifact) 
 delete propertyADMIN (Artifact) 
 get propertyREAD (Artifact) 
 refreshWRITE (Instance) 
 write metadataADMIN (Artifact) 
 read metadataREAD (Artifact) 
ApplicationdeployWRITE (Namespace)ADMIN (Application)
 getREAD (Application) 
 listREAD (Namespace) 
 updateADMIN (Application) 
 deleteADMIN (Application) 
 set preferenceWRITE (Application) 
 get preferenceREAD (Application) 
 add metadataADMIN (Application) 
 get metadataREAD (Application) 
Programsstart/stop/debugEXECUTE (Program) 
 set instancesADMIN (Program) 
 listREAD (Namespace) 
 set runtime argsEXECUTE (Program) 
 get runtime argsREAD (Program) 
 get instancesREAD (Program) 
 set preferenceADMIN (Program) 
 get preferenceREAD (Program) 
 get statusREAD (Program) 
 get historyREAD (Program) 
 add metadataADMIN (Program) 
 get metadataREAD (Program) 
 emit logsWRITE (question) (Program) 
 view logsREAD (Program) 
 emit metricsWRITE (question) (Program) 
 view metricsREAD (Program) 
StreamscreateWRITE (Namespace)ADMIN (Stream)
 update propertiesADMIN (Stream) 
 deleteADMIN (Stream) 
 truncateADMIN (Stream) 
 enqueue
asyncEnqueue
batch
WRITE (Stream) 
 getREAD (Stream) 
 listREAD (Namespace) 
 read eventsREAD (Stream) 
 set preferencesADMIN (Stream) 
 get preferencesREAD (Stream) 
 add metadataADMIN (Stream) 
 get metadataREAD (Stream) 
 view lineageREAD (Stream) 
 emit metricsWRITE (question) (Stream) 
 view metricsREAD (Stream) 
DatasetslistREAD (Namespace) 
 getREAD (Dataset) 
 createWRITE (Namespace)ADMIN (Dataset)
 updateADMIN (Dataset) 
 dropADMIN (Dataset) 
 executeAdmin (exists/truncate/upgrade)ADMIN (Dataset) 
 add metadataADMIN (Dataset) 
 get metadataREAD (Dataset) 
 view lineageREAD (Dataset) 
 emit metricsWRITE (question) (Dataset) 
 view metricsREAD (Dataset) 

NOTE: Cells marked green were done in 3.4. Cells marked in yellow are in scope for 3.5.

Out-of-scope User Stories (4.0 and beyond)

  1. As a CDAP admin, I should be able to authorize metadata changes to CDAP entities
  2. As a CDAP system, I should be able to push down ACLs to storage providers
  3. As a CDAP admin, I should be able to see an audit log of all authorization-related changes in CDAP
  4. As a CDAP admin, I should be able to authorize all thrift-based traffic, so transaction management is also authorized.

References

  • No labels