Objective
Publish all audit logs for changes done to CDAP entities so that other apps/tools like Cask Tracker, MDM, etc can use this as a source for audit information.
For 3.4 release, we'll limit the scope to publishing changes for Datasets and Streams.
Use Cases
Use cases and user stories are documented at Cask Tracker (formerly Cask Finder).
Design Choices
We chose Kafka to be the system where audit information gets published from CDAP. Other tools can subscribe to the Kafka feed to get audit information. Using Kakfa could make integrating external tools integrating with CDAP for audit logs easier.
However, publishing to Kafka has certain drawbacks limitations today that will need to be addressed later -
- Kafka publish does not happen in a transaction, so there is a chance that the audit log feed from Kafka may be inconsistent compared to what actually happened. CDAP-5109 has more discussion on it.
- There is no access control on who can publish audit information to Kafka (CDAP-5130).
- Messages in Kafka are transient. They will be deleted after a few days in most setups. The subscribers will have to consume the messages before they are deleted.
Audit Message Format
Audit feed will be a stream of audit messages as defined below.
...
- CREATE
- UPDATE
- TRUNCATE
- DELETE
- ACCESS (sub types: READ, WRITE, BOTH, UNKNOWN)
- METADATA_CHANGE
Code Block |
---|
[ /** Dataset Metadataaccess changeoperation **/ { "time": 14569566594691456956659468, "entityId": { "namespace": "ns1", "datasetstream": "ds1stream1", "entity": "DATASETSTREAM" }, "user": "cdapuser1", "type": "METADATA_CHANGEACCESS", "infopayload": { "previousaccessType": [ "WRITE", "accessor": { { "namespace": "ns1", "scopeapplication": "SYSTEMapp1", "type": "Flow", "properties "program": {}"flow1", "run": "run1", "tagsentity": ["PROGRAM_RUN" } } "tag2" }, /** Explore stream access ]**/ { "time": }1456956659469, "entityId": { { "namespace": "ns1", "scopestream": "USERstream1", "entity": "STREAM" }, "propertiesuser": {}"user1", "type": "ACCESS", "tagspayload": [{ "accessType": "UNKNOWN", "utag0"accessor": { "service": "explore", ] "entity": "SYSTEM_SERVICE" } } ]}, /** Metadata "additions": [change **/ { "time": 1456956659470, "entityId": { { "namespace": "ns1", "scopeapplication": "USERapp1", "entity": "APPLICATION" "properties}, "user": {"user1", "type": "METADATA_CHANGE", "key1payload": { "value1previous": { },"USER": { "tagsproperties": [{ "utag1""uk": "uv", ] "uk1": "uv2" } }, ], "deletionstags": [ { "ut1", "scope": "SYSTEM",ut2" ] }, "propertiesSYSTEM": {}, "tagsproperties": { [ "sk": "tag2sv" ]}, } "tags": [] ] } }, /** Dataset admin operation **/ { "additions": { "timeSYSTEM": 1456956659470,{ "entityIdproperties": { "namespacesk": "ns1sv", "dataset": "ds1", }, "entitytags": [ "DATASET" }, "user": "cdapt1", "type": "CREATE"t2" }, ] /** Dataset access operation **/ } { "time": 1456956659471}, "entityIddeletions": { "namespaceUSER": "ns1",{ "datasetproperties": "ds1", { "entityuk": "DATASET"uv" }, "usertags": "cdap",[ "typeut1": "ACCESS", "access": { ] "type": "READ", } "programRun": { } } "namespace": "ns1" }, /** Dataset admin operation **/ { "applicationtime": "app1", 1456956659471, "entityId": { "typenamespace": "Flowns1", "programdataset": "flow1ds1", "runentity": "xxx-yyy-zzzDATASET", }, "entityuser": "PROGRAM_RUNuser1", "type": "CREATE", } "payload": {} } ] |
Implementation
The Audit log information will be published to CDAP Kafka server when `audit.publish.enabled` config parameter is set to true.
- Dataset admin operations can be published by DatasetOpExecutor service.
- Stream admin operations can be published by StreamAdmin class
- Dataset and stream access information can be published by LineageWirterDatasetFramework (can be renamed to AuditingDatasetFramework)piggy backing on lineage capturing code.
- Metadata changes can be published by DefaultMetadataStore class.
...