Audit Log
Use-cases
Case #1
- Rishab is a data scientist/engineer at a company that implements a Data Lake. He is analyzing the effectiveness of the recommendation engine on the company's e-commerce site. For this investigation, he wants to analyze a dataset that includes click log for the last year. He is looking for clean click log data that is up-to-date. He wants to use part of the data to build model and rest to score the model and validate the predictions.
- Before he can conduct an analysis, Rishab needs to confirm the dataset is available in the Data Lake.
- To do so, he wishes to find all entities that include “click log”.
- He arrives at the Finder home screen (from nav, search results, other entry points?).For this analysis, Rishab is most concerned with the recency, the accuracy, and the integrity of the data.
- Enters “click log” in the Search Box and clicks Search.
- He arrives at the Results Page.
- Results returned
- By default, they are sorted by creation time
- Each Result includes:
- Snippet of the metadata that matches his query in context.
- Important to help him evaluate the relevance of the results.
- Date Created
- To know how recent/new it is.
- Snippet of the metadata that matches his query in context.
- He clicks the result and arrives at the Entity Detail Page where he can view all of the metadata associated with an entity.
- Rishab wished to verify the validity of the sources of this dataset. To do so, he clicks the Lineage Tab to trace the creation of this dataset to its source.
- Finder displays the lineage for this dataset as a diagram. The selected dataset displays in the center; to the left is the entity that precedes it and to the right is the one it precedes.
- Rishab discovers that it has been created from two separate sources.
- He then clicks one of the sources which takes him to the Entity Page of that dataset.
- He clicks on a program to see what has been done to the dataset.
- Rishab clicks the Audit Logs Tab to see how active this dataset has been - when was it last updated, who is using it, writing to it, reading from it.
- Rishab clicks the appropriate action to make this dataset a new source for his existing Click Log processing pipeline.
- This takes him to the Hydrator Studio where he can edit the Master Click Log pipeline.
Storing Audit Log
- Goal: Read AuditLog messages from Kafka and write messages to Table dataset.
- Reusing the MetadataConsumer flowlet from the Navigator App to handle reading messages from Kafka
- Beacuse of this, the app requires a Kafka config in order to be installed
{ "config": { "metadataKafkaConfig": { "brokerString": "<host>:<port>", "topic" : "audit" } } }
- Beacuse of this, the app requires a Kafka config in order to be installed
- New Flowlet (AuditLogPublisher) for writing Kafka messages to Dataset
- Dataset is a Table class
- Dataset key format: <namespace>-<type>-<name>-<messageTimeInMilliSecondsLong>-<UUID>
- Dataset Columns:
- timestamp - Long - timestamp of the message generated
- entityId - EntityId - the entity id that the message refers to. Only entity types with a namespace are supported.
- user - String - the name of the user that generated the message. If the user blank, a default value of "unknown" is inserted.
- actionType - String - The type of action that was taken. For more details, see: Audit information publishing
- entityType - String - The EntityType from the id, lowercase
- entityName - String - The name of the Entity
- metadata - AuditPayload - The change that was made, either a metadata change or an access. For all other types, the payload is empty
- Reusing the MetadataConsumer flowlet from the Navigator App to handle reading messages from Kafka
Reading Audit Log
- Goal: Expose the AuditLog dataset as a REST API for consumption by the UI
- Fields returned
- totalResults - the total number of results for the query
- offset - The starting offset of the first result
- results - An array of result records with a max length of pageSize
REST API Design
HTTP Request Type
Endpoint:
Request Params
Response Status
Response Body
GET /namespaces/{namespace-id}/apps/Tracker/services/AuditLog/methods/auditlog name is Required Description Default Value type yes The type of the entity to search for, e.g. dataset or stream. Any namespaced entity can be searched for. Possible values: application, artifact, dataset, dataset_module, dataset_type, flowlet, flowlet_queue, notification_feed, program, program_run, schedule, stream, stream_view name yes The name of the entity to search for startTime no The start time to search for. Accepts "now - 1d" syntax. Seconds granularity. 0 endTime no The end time to search for. Accepts "now - 1d" syntax. Seconds granularity. now offset no The offset to start the results at for paging 0 limit no The max number of results to return in the results 10 200 returns the audit log entries requested
500 error while searching
{ totalResults: 1, results: [{ time: 1457467029557, entityId: { namespace: "default", application: "testCubeAdapter", type: "Workflow", program: "ETLWorkflow", entity: "PROGRAM" }, user: "unknown", type: "METADATA_CHANGE", payload: { previous: { SYSTEM: { properties: { }, tags: [ ] } }, additions: { SYSTEM: { properties: { }, tags: [ "ETLMapReduce", "Batch", "Workflow", "ETLWorkflow" ] } }, deletions: { SYSTEM: { properties: { }, tags: [ ] } } } }], offset: 0 }
Example of no results being found.
{ totalResults: 0, results: [ ], offset: 0 }
- Fields returned