...
NOTE: This document also assumes that the Authorizer extension is Apache Sentry, so calls out Thrift as the communication mechanism
Program Runtime
Access datasets, streams and secure keys
During program runtimes, users can access datasets, streams and secure keys through program APIs (MapReduce/Spark/Flows) or through Dataset APIs (getDataset)
Administer datasets, streams and secure keys
During program runtimes, users can administer datasets, streams and secure keys via the Admin APIs
Update system metadata
During program runtimes, CDAP performs various system operations for:
- Recording Audit
- Recording Lineage
- Recording Usage
- Recording Run Records
- Namespace Lookup
- Authorization Enforcement
Explore
Access datasets and streams
Users can execute Hive SELECT (for BatchReadable datasets) and INSERT (for BatchWritable datasets queries via Explore to access data in datasets and streams.
Administer datasets and streams
Create operations on datasets and streams can create tables in Hive if explore is enabled. Similarly, delete can drop and truncate tables.
REST APIs
REST APIs
Publicly routed REST APIs in AppFabric Service
Application Deployment
Applications with non-existing dataset
- Client --> Router HTTP:
deployApp(artifact, appConfig)
- Router --> AppFabric HTTP:
deployApp(artifact, appConfig, SecurityRequestContext.userId)
- AppFabric --> AuthEnforcer:
!authorized(SecurityRequestContext.userId) ? UnauthorizedException
- AppFabric --> AppFabric:
doAs(namespace, deploy(jar, config))
- AppFabric --> DatasetServiceClient:
createDataset()
- DatasetServiceClient --> DatasetService HTTP
: createDataset(ds, Header(CDAP-UserId=SecurityRequestContext.userId))
- DatasetService --> AuthEnforcer
: !authorized(SecurityRequestContext.userId) ? UnauthorizedException
- DatasetService --> Authorizer Thrift:
revoke(ds); grant(ds, SecurityRequestContext.userId, ALL)
- DatasetService --> DatasetOpExecutor HTTP:
success = doAs(namespace, createDataset(ds))
- DatasetService --> Authorizer Thrift:
!success ? revoke(ds)
- DatasetService --> AppFabric --> Router --> Client HTTP:
result
Applications with existing dataset
- Client --> Router HTTP:
deployApp(artifact, appConfig)
- Router --> AppFabric HTTP:
deployApp(artifact, appConfig, SecurityRequestContext.userId)
- AppFabric --> AuthEnforcer:
!authorized(SecurityRequestContext.userId) ? UnauthorizedException
- AppFabric --> AppFabric:
doAs(namespace, deploy(jar, config))
- AppFabric --> DatasetServiceClient: !
compatibleUpdate ? IncompatibleException
- DatasetServiceClient --> DatasetService HTTP
: update(ds, Header(CDAP-UserId=SecurityRequestContext.userId))
- DatasetService --> AuthEnforcer
: !authorized(SecurityRequestContext.userId) ? UnauthorizedException
- DatasetService --> DatasetService:
success = update(ds)
- DatasetService --> AppFabric --> Router --> Client HTTP:
result
Applications with non-existing streams
Applications with existing streams
Namespace Creation
Namespace Deletion
Publicly routed REST APIs in Dataset Service
Create
- Client --> Router HTTP:
createDataset(dataset, type, properties)
- Router --> DatasetService HTTP:
createDataset(dataset, type, properties, SecurityRequestContext.userId)
- DatasetService --> AuthEnforcer
: !authorized(SecurityRequestContext.userId) ? UnauthorizedException
- DatasetService --> Authorizer Thrift:
revoke(dataset); grant(dataset, SecurityRequestContext.userId, ALL)
- DatasetService --> DatasetOpExecutor HTTP:
success = doAs(namespace, createDataset(dataset))
- DatasetService --> Authorizer Thrift:
!success ? revoke(dataset)
- DatasetService --> Router --> Client HTTP:
result
List
- Client --> Router HTTP:
listDatasets(namespace)
- Router --> DatasetService HTTP:
listDatasets(namespace, SecurityRequestContext.userId)
- DatasetService --> AuthEnforcer
: result = filter(datasetsInNamespace, SecurityRequestContext.userId)
- DatasetService --> Router --> Client HTTP:
result
Get
- Client --> Router HTTP:
getDataset(dataset)
- Router --> DatasetService HTTP:
dataset = getDataset(dataset, SecurityRequestContext.userId)
- DatasetService --> AuthEnforcer
: result = filter(dataset, SecurityRequestContext.userId)
- DatasetService --> Router --> Client HTTP:
result.isEmpty ? UnauthorizedException
Update
- Client --> Router HTTP:
updateDataset(dataset, type, properties)
- Router --> DatasetService HTTP:
updateDataset(dataset, type, properties, SecurityRequestContext.userId)
- DatasetService --> AuthEnforcer
: !authorized(SecurityRequestContext.userId) ? UnauthorizedException
- DatasetService --> DatasetService:
result = update(dataset, type, properties)
- DatasetService --> Router --> Client HTTP:
result
Truncate
- Client --> Router HTTP:
truncate(dataset)
- Router --> DatasetService HTTP:
truncate(ds, SecurityRequestContext.userId)
- DatasetService --> AuthEnforcer
: !authorized(SecurityRequestContext.userId) ? UnauthorizedException
- DatasetService --> DatasetOpExecutor HTTP:
result = doAs(namespace, truncate(dataset))
- DatasetService --> Router --> Client HTTP:
result
Drop
- Client --> Router HTTP:
drop(dataset)
- Router --> DatasetService HTTP:
drop(dataset, SecurityRequestContext.userId)
- DatasetService --> AuthEnforcer
: !authorized(SecurityRequestContext.userId) ? UnauthorizedException
- DatasetService --> DatasetOpExecutor HTTP:
result = doAs(namespace, drop(dataset))
- DatasetService --> Authorizer Thrift:
revoke(dataset)
- DatasetService --> Router --> Client HTTP:
result
Upgrade
- Client --> Router HTTP:
upgrade(dataset)
- Router --> DatasetService HTTP:
upgrade(dataset, SecurityRequestContext.userId)
- DatasetService --> AuthEnforcer
: !authorized(SecurityRequestContext.userId) ? UnauthorizedException
- DatasetService --> DatasetOpExecutor HTTP:
result = doAs(namespace, upgrade(dataset))
- DatasetService --> Router --> Client HTTP:
result
Publicly routed REST APIs in Stream Service
Program Runtime
Access datasets, streams and secure keys
During program runtimes, users can access datasets, streams and secure keys through program APIs (MapReduce/Spark/Flows) or through Dataset APIs (getDataset)
Administer datasets, streams and secure keys
During program runtimes, users can administer datasets, streams and secure keys via the Admin APIs
Update system metadata
During program runtimes, CDAP performs various system operations for:
- Recording Audit
- Recording Lineage
- Recording Usage
- Recording Run Records
- Namespace Lookup
- Authorization Enforcement
Explore
Access datasets and streams
Users can execute Hive SELECT (for BatchReadable datasets) and INSERT (for BatchWritable datasets queries via Explore to access data in datasets and streams.
Administer datasets and streams
Create operations on datasets and streams can create tables in Hive if explore is enabled. Similarly, delete can drop and truncate tables.
Scratch Pad
a) Authorization
...