...
Use cases and user stories are documented at Cask Tracker (formerly Cask Finder).
Design Choices
We chose Kafka to be the system where audit information gets published from CDAP. Other tools can subscribe to the Kafka feed to get audit information. Using Kakfa could make integrating external tools with CDAP easier.
...
- CREATE
- UPDATE
- TRUNCATE
- DELETE
- ACCESS (sub types: READ, WRITE, UNKNOWN)
- METADATA_CHANGE
Code Block |
---|
[ /** Dataset Metadataaccess changeoperation **/ { "time": 14569566594691456956659468, "entityId": { "namespace": "ns1", "datasetstream": "ds1stream1", "entity": "DATASETSTREAM" }, "user": "cdapuser1", "type": "METADATA_CHANGEACCESS", "infopayload": { "previousaccessType": [ "WRITE", "accessor": { { "namespace": "ns1", "scopeapplication": "SYSTEMapp1", "type": "Flow", "properties "program": {}, "flow1", "run": "run1", "tagsentity": ["PROGRAM_RUN" } } "tag2" }, /** Explore stream access ]**/ { "time": }1456956659469, "entityId": { { "namespace": "ns1", "scopestream": "USERstream1", "entity": "STREAM" "properties }, "user": {}"user1", "type": "ACCESS", "tagspayload": { [ "accessType": "UNKNOWN", "utag0"accessor": { "service": "explore", ] "entity": "SYSTEM_SERVICE" } } ] }, /** Metadata change **/ "additions { "time": [1456956659470, "entityId": { { "namespace": "ns1", "scopeapplication": "USERapp1", "entity": "APPLICATION" }, "propertiesuser": {"user1", "type": "METADATA_CHANGE", "key1payload": { "value1previous": { },"USER": { "tagsproperties": [{ "uk": "utag1uv", ] "uk1": "uv2" } }, ], "deletionstags": [ "ut1", { "scope": "SYSTEM", ut2" ] }, "propertiesSYSTEM": {}, "tagsproperties": [{ "tag2sk": "sv" ]}, } "tags": [] ] } }, /** Dataset admin operation **/ "additions": { { "timeSYSTEM": 1456956659470, { "entityIdproperties": { "namespacesk": "ns1sv", "dataset": "ds1", }, "entitytags": [ "DATASET" }, "user": "cdapt1", "type": "CREATE" "t2" }, ] /** Dataset access operation **/ } { "time": 1456956659471}, "entityIddeletions": { "namespaceUSER": "ns1", { "datasetproperties": "ds1", { "entityuk": "DATASETuv" }, "usertags": "cdap", [ "typeut1": "ACCESS", "access": { ] "accessType": "READ", } "programRun": { } } "namespace": "ns1", "application": "app1", }, /** Dataset admin operation **/ { "time": 1456956659471, "entityId": { "typenamespace": "Flowns1", "programdataset": "flow1ds1", "runentity": "xxx-yyy-zzz",DATASET" }, "entityuser": "PROGRAM_RUNuser1", "type": "CREATE", } "payload": {} } ] |
Implementation
The Audit log information will be published to CDAP Kafka server when `audit.publish.enabled` config parameter is set to true.
- Dataset admin operations can be published by DatasetOpExecutor service.
- Stream admin operations can be published by StreamAdmin class
- Dataset and stream access information can be published by LineageWirterDatasetFramework (can be renamed to AuditingDatasetFramework)piggy backing on lineage capturing code.
- Metadata changes can be published by DefaultMetadataStore class.
...