Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Objective

Publish all changes done to entities so that other apps/tools like Cask Tracker, MDM, etc can use this as a source for audit information.

Use Cases

Use cases and user stories are documented at Cask Tracker (formerly Cask Finder).

Design Choices

We chose Kafka to be the system where audit information gets published from CDAP. Other tools can subscribe to the Kafka feed to get audit information.

However, publishing to Kafka has certain drawbacks today that will need to be addressed later -

  • Kafka publish does not happen in a transaction, so there is a chance that the audit log feed from Kafka may be inconsistent compared to what actually happened. CDAP-5109 has more discussion on it.
  • There is no access control on who can publish audit information to Kafka (CDAP-5130).

Audit Message Format

Audit feed will be a stream of audit messages as defined below.

Types of Audit Message

The following types of audit messages are published for an entity -

  • CREATE
  • UPDATE
  • TRUNCATE
  • DELETE
  • ACCESS (sub types: READ, WRITE, BOTH, UNKNOWN)
  • METADATA_CHANGE

 

[
  /** Metadata change **/
  {
    "time": 1456956659469,
    "entityId": {
      "namespace": "ns1",
      "dataset": "ds1",
      "entity": "DATASET"
    },
    "user": "cdap",
    "type": "METADATA_CHANGE",
    "change": {
      "additions": [
        {
          "scope": "USER",
          "properties": {
            "key1": "value1"
          },
          "tags": [
            "tag1"
          ]
        }
      ],
      "deletions": [
        {
          "scope": "SYSTEM",
          "properties": {},
          "tags": [
            "tag2"
          ]
        }
      ]
    }
  },
  
  /** Dataset admin operation **/ 
  {
    "time": 1456956659470,
    "entityId": {
      "namespace": "ns1",
      "dataset": "ds1",
      "entity": "DATASET"
    },
    "user": "cdap",
    "type": "CREATE"
  },
  
  /** Dataset access **/
  {
    "time": 1456956659471,
    "entityId": {
      "namespace": "ns1",
      "dataset": "ds1",
      "entity": "DATASET"
    },
    "user": "cdap",
    "type": "ACCESS",
    "access": {
      "type": "READ",
      "entityId": {
        "namespace": "ns1",
        "application": "app1",
        "type": "Flow",
        "program": "flow1",
        "entity": "PROGRAM"
      }
    }
  }
]

Implementation

The Audit log information will be published to CDAP Kafka server when `audit.publish.enabled` config parameter is set to true.

  • Dataset admin operations can be published by DatasetOpExecutor service.
  • Stream admin operations can be published by StreamAdmin class
  • Dataset and stream access information can be published by LineageWirterDatasetFramework (can be renamed to AuditingDatasetFramework).
  • Metadata changes can be published by DefaultMetadataStore class.

 

  • Note: Publishing of metadata updates to Kafka introduced by CDAP-3518 for Navigator integration is of a different format, and it will still continue to be published when enabled. It would be good to see if we can consolidate audit publishing and metadata publishing in future. There is not enough time to do that exercise for 3.4.

 

  • No labels