Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

 
Table of Contents

Checklist

  •  User Stories Documented
  •  User Stories Reviewed
  •  Design Reviewed
  •  APIs reviewed
  •  Release priorities assigned
  •  Test cases reviewed
  •  Blog post

Introduction 

Support Pluggable Log Saver Plugins in Log Saver for users to provide custom plugins to configure and store log messages and also improve log saver for other functionalities like versioning, impersonation, resilience, etc. 

Existing Problems

Impersonation

Current implementation of LogSaver uses namespace directory to store logs which makes LogSaver dependent on impersonation. Impersonation could be avoided for logsaver if log files storage is not namespacedFor example, use a common root directory outside any CDAP namespace to store log files and use namespace-version-id to differentiate between namespaces. This would prohibit the users to look at their log files directly in HDFS as they are owned by "cdap". We need to consider if we have to provide options to make the "user" flexible for log files written by log saver plugins.  

Entity Deletion and Recreation

LogSaver currently does not handle the case where a given entity is deleted and recreated with the same name. This is a common scenario on development clusters where developers can delete and recreate entity with the same name. There are couple of open issues present <link>

Pluggable Log Appenders

User might want to configure LogSaver to collect logs at different level and also write log messages at different destination. Having capability to add custom LogAppender will allow users to implement custom logic for logs in addition to existing CDAP LogSaver. For example, user might want to collect debug logs per application and store them on HDFS at application-dir/{debug.log} location.

DataStructures

TODO: Look into improving messageTable used for sorting log messages.

Resilience

Making LogSaver resilient to failure scenarios.

TODO: Add more details

Goals

  • Provides a unified framework for processing log events read off from the transport (Kafka currently).
  • Allows arbitrary number of processors to process the log events.
  • Provides implementations for common processors (e.g write to HDFS, rotation, expiration... etc).

User Stories 

  • For each application, user wants to collect the application's logs into multiple logs files based on log level 
  • For each application, user wants to configure a location in HDFS to be used to store the collected logs.
  • For each application, User wants the application log files stored in text format.
  • For each application, User would wants to configure the RotationPolicy of log files.
  • For each application, user wants to configure different layout for formatting log messages for the different log files generated.
  • User wants to group log messages at application level and write multiple separate log files for each application. Example: application-dir/{audit.log, metrics.log, debug.log}
  • User wants to write these log files to a configurable path in HDFS.
  • User wants to configure log saver extension using logback configuration.
  • User also wants to be able to configure rolling policy for these log files similar to log-back. 

Design

Principles

  • Establish a clear role separation between the framework and individual processor
    • Framework responsibilities
      • Reading log events from the log transport
      • Transport offset management
      • Grouping, buffering and sorting of log events
      • Configuration of log processors
      • Defines contract and services between the framework and log processors
        • Lifecycle management (instantiate, process, destroy)
        • Batch events processing
        • Impersonation
        • Exception handling policies for exceptions raised by log processors
    • Processor responsibilities
      • Actual processing of events (mostly writing to somewhere)
      • Files and metadata management
  • Scalability
    • Memory usage should be relatively constant regardless of the log events rate
      • It can varies depending on individual event size (message length, stacktrace, etc.), but it shouldn't vary too much
    • Memory usage can be proportional to number of log processors
    • Memory usage can be proportional to number of program/application logs being processed by the same log process
    • Scaling to multiple JVM processes should only be along with the number of concurrent programs/applications running
  • Transport neutral
    • In distributed mode, Kafka is being used. We may switch to TMS in future.
    • In SDK / unit-test mode, can be just an in memory queue, or using local TMS.

Log Processing Framework

As mentioned above, the log processing framework (the framework) is responsible for dispatching log events to individual appenders to process, as well as making CDAP services available for appenders.

TODO

Log Processor

Given that the Logback library is a widely used logging library for SLF4J and is also the logging library used by CDAP, we can simply have log processors implemented through Logback Appender API, as well as configured through usual logback.xml configuration. Here are the highlights for such design:

  • Any number of log appenders can be configured through a logging framework specific logback.xml file (not the one used by CDAP nor for containers).
  • All appender configurations supported by logback.xml should be supported (e.g. appender-ref for delegation/wrapping).
  • Any existing Logback Appender implementations should be usable without any modification, as long as being properly configured in the runtime environment.
  • For Appender implementation that needs to gain access to CDAP services (e.g. dataset, transaction, metrics, impersonation... etc), a CDAP AppenderContext will be provided. There will be new CDAP interface for the Appender to optionally implemented in order to get hold of the AppenderContext instance.
  • An optional interface for Appender to implement to get notified for beginning and end of a batch of log events.

CDAP Log Appender

There will be a log appender implementation for CDAP, replacing the current log-saver. The CDAP log appender is responsible for:

  • Write log events into Avro files and maintaining metadata to those files.
    • The metadata and the files generated are the only contract between the CDAP log appender and the log query handler.
  • Policy based rolling of files / metadata
    • Removing files that passed TTL, enforcing max total size/number of files.
    • See Logback rolling policy for examples.
  • Should implements the optional interfaces to get access to AppenderContext and log events batch start/end notifications
    • Performs file sync (hsync for HDFS files) on batch end
    • Also persist processing state into metadata store. The processing state contains the transport offsets (Kafka start/end offsets or TMS messageIds) provided via the batch start/end methods. The appender may store other state information generated by itself.
    • Interim failure in writing the state to metadata store shouldn't fail the call.
      • Dataset service / HBase / TX service can have service interruption
      • Under normal operation, once those services become available, the latest states will be persisted.
      • The worst case scenario is having some duplicated log events in log file, but it's acceptable.
        • E.g. the log processing JVM died before successfully persisting the latest state

 

Aspects to consider:

  • Kafka Partitioning
  • Sorting and Bucketing
  • Checkpointing
  • Resilience 
  • Classloader Isolation

    Assumptions:

    • Appenders are synchronous

    Approach

    Approach #1

    Approach #2

    Other Log Appenders

    • RollingHDFSAppender
      • Similar to RollingFileAppender provided by Logback library
      • Write to files through the Hadoop HDFS API

    API changes

    New Programmatic APIs

    New Java APIs introduced (both user facing and internal)

    Deprecated Programmatic APIs

    New REST APIs

    PathMethodDescriptionResponse CodeResponse
    /v3/apps/<app-id>GETReturns the application spec for a given application

    200 - On success

    404 - When application is not available

    500 - Any internal errors

     

         

    Deprecated REST API

    PathMethodDescription
    /v3/apps/<app-id>GETReturns the application spec for a given application

    CLI Impact or Changes

    • Impact #1
    • Impact #2
    • Impact #3

    UI Impact or Changes

    • Impact #1
    • Impact #2
    • Impact #3

    Security Impact 

    What's the impact on Authorization and how does the design take care of this aspect

    Impact on Infrastructure Outages 

    System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

    Test Scenarios

    Test IDTest DescriptionExpected Results
       
       
       
       

    Releases

    Release X.Y.Z

    Release X.Y.Z

    Related Work

    • Work #1
    • Work #2
    • Work #3

     

    Future work