Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 
Table of Contents

Checklist

  •  User Stories Documented
  •  User Stories Reviewed
  •  Design Reviewed
  •  APIs reviewed
  •  Release priorities assigned
  •  Test cases reviewed
  •  Blog post

Introduction 

Support Pluggable Log Saver Plugins in Log Saver for users to provide custom plugins to configure and store log messages and also improve log saver for other functionalities like versioning, impersonation, resilience, etc. 

Existing Problems

Impersonation

Current implementation of LogSaver uses namespace directory to store logs which makes LogSaver dependent on impersonation. Impersonation could be avoided for logsaver if log files storage is not namespacedFor example, use a common root directory outside any CDAP namespace to store log files and use namespace-version-id to differentiate between namespaces. This would prohibit the users to look at their log files directly in HDFS as they are owned by "cdap". We need to consider if we have to provide options to make the "user" flexible for log files written by log saver plugins.  

Entity Deletion and Recreation

LogSaver currently does not handle the case where a given entity is deleted and recreated with the same name. This is a common scenario on development clusters where developers can delete and recreate entity with the same name. There are couple of open issues present <link>

Pluggable Log Appenders

User might want to configure LogSaver to collect logs at different level and also write log messages at different destination. Having capability to add custom LogAppender will allow users to implement custom logic for logs in addition to existing CDAP LogSaver. For example, user might want to collect debug logs per application and store them on HDFS at application-dir/{debug.log} location.

DataStructures

TODO: Look into improving messageTable used for sorting log messages.

Resilience

Making LogSaver resilient to failure scenarios.

TODO: Add more details

Goals

  • Provides a unified framework for processing log events read off from the transport (Kafka currently).
  • Allows arbitrary number of processors to process the log events.
  • Provides implementations for common processors (e.g write to HDFS, rotation, expiration... etc).

User Stories 

  • For each application, user wants to collect the application's logs into multiple logs files based on log level 
  • For each application, user wants to configure a location in HDFS to be used to store the collected logs.
  • For each application, User wants the application log files stored in text format.
  • For each application, User would wants to configure the RotationPolicy of log files.
  • For each application, user wants to configure different layout for formatting log messages for the different log files generated.
  • User wants to group log messages at application level and write multiple separate log files for each application. Example: application-dir/{audit.log, metrics.log, debug.log}
  • User wants to write these log files to a configurable path in HDFS.
  • User wants to configure log saver extension using logback configuration.
  • User also wants to be able to configure rolling policy for these log files similar to log-back. 

Design

Principles

  • Establish a clear role separation between the framework and individual processor
    • Framework responsibilities
      • Reading log events from the log transport
      • Transport offset management
      • Grouping, buffering and sorting of log events
      • Configuration of log processors
      • Defines contract and services between the framework and log processors
        • Lifecycle management (instantiate, process, flush, destroy)
        • Impersonation
        • Exception handling policies for exceptions raised by log processors
    • Processor responsibilities
      • Actual processing of events (mostly writing to somewhere)
      • Files and metadata management
  • Scalability
    • Memory usage should be relatively constant regardless of the log events rate
      • It can varies depending on individual event size (message length, stacktrace, etc.), but it shouldn't vary too much
    • Memory usage can be proportional to number of log processors
    • Memory usage can be proportional to number of program/application logs being processed by the same log process
    • Scaling to multiple JVM processes should only be along with the number of concurrent programs/applications running
  • Transport neutral
    • In distributed mode, Kafka is being used. We may switch to TMS in future.
    • In SDK / unit-test mode, can be just an in memory queue, or using local TMS.

Log Processing Framework

As mentioned above, the log processing framework (the framework) is responsible for dispatching log events to individual appenders to process, as well as making CDAP services available for appenders.

  • Transport state management
    • Interim failure in writing the state to state store shouldn't fail the call.
      • Dataset service / HBase / TX service can have service interruption
      • Under normal operation, once those services become available, the latest states will be persisted.
    • The worst case scenario is having some duplicated log events in log file, but it's acceptable.
      • E.g. the log processing JVM died before successfully persisting the latest state

Log Processor

Given that the Logback library is a widely used logging library for SLF4J and is also the logging library used by CDAP, we can simply have log processors implemented through Logback Appender API, as well as configured through usual logback.xml configuration. Here are the highlights for such design:

  • Any number of log appenders can be configured through a logging framework specific logback.xml file (not the one used by CDAP nor for containers).
  • All appender configurations supported by logback.xml should be supported (e.g. appender-ref for delegation/wrapping).
  • Any existing Logback Appender implementations should be usable without any modification, as long as being properly configured in the runtime environment.
  • Logback Appender already implements a ContextAware interface such that Appender gets an instance of ContextAware. In order to provide CDAP services to Appender, we can have a CDAP implementation of ContextAware and let the Appender implementation do the casting when needed.
  • Can optionally implements the Flushable interface. The framework will call the flush method before preserving the transport offset to make sure there is no data loss.

CDAP Log Appender

There will be a log appender implementation for CDAP, replacing the current log-saver. The CDAP log appender is responsible for:

  • Write log events into Avro files and maintaining metadata to those files.
    • The metadata and the files generated are the only contract between the CDAP log appender and the log query handler.
  • Policy based rolling of files / metadata
    • Removing files that passed TTL, enforcing max total size/number of files.
    • See Logback rolling policy for examples.
  • Implements the Flushable interface to perform hsync for HDFS files

Other Log Appenders

  • RollingHDFSAppender
    • Similar to RollingFileAppender provided by Logback library
    • Write to files through the Hadoop HDFS API

API changes

New Programmatic APIs

New Java APIs introduced (both user facing and internal)

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionResponse CodeResponse
/v3/apps/<app-id>GETReturns the application spec for a given application

200 - On success

404 - When application is not available

500 - Any internal errors

 

     

Deprecated REST API

PathMethodDescription
/v3/apps/<app-id>GETReturns the application spec for a given application

 

Namespace Deletion at log saver

- close currently open log files for apps,programs,etc in that namespace - should be done for both CDAP log appender and log processors
- CDAP log appender should either delete or make the files from old namespace unavailable through meta data changes. storage structure has to change from existing format to support this.
- what should be done about the files written by log processors ?
- unprocessed kafka events for the deleted namespace should be dropped.when to stop dropping the events ? if namespace is recreated, how to ensure the new logs from this new namespace are not dropped?
   - one option is to have a flag in log saver, when namespace, app,etc is deleted, this flag can be set to drop events in those context
   - when these app, namespace are recreated, the flag can be reset to avoid skipping events.

Storage

 

CLI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

UI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results
   
   
   
   

Releases

Release X.Y.Z

Release X.Y.Z

Related Work

  • Work #1
  • Work #2
  • Work #3

 

Future work