Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 8 Next »

Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

This document covers the design for User Friendly Logs for CDAP. Today its very cumbersome for a user to debug any failures in Pipelines using the current logs.

This is because of several reasons:

  1. The logs are filled with a large number of logs that the user is not interested in. These include debugs from CDAP platform code or other underlying dependencies.
  2. With the large number of logs shown, the interesting information for the User is often lost, rendering the logs unusable. 
  3. Moreover the interesting information is not clearly communicated in the logs. The logs are targeted towards the developer rather than the User. 
  4. Errors are wrapped over multiple times and do not communicate the root cause of the problem for the user and how the User can recover from the error. 

This document covers the items required to address the above mentioned shortcomings with the CDAP logs. The document mainly covers all work items for Release 4.2

Goals

  1. CDAP Pipeline and Program Logs must help User assess the progress of the program in success scenario and debug in case of failures.  
  2. Provide Guidelines for logs targeted for Users and the Log level that should be used for them. 

User Stories 

  • As a CDAP User I want to see crisp and concise logs clearly showing the progress of my Pipeline, Application or Program.

  • As a CDAP User I want to see error messages very clearly conveyed in the logs in case of any failures.

  • As a CDAP User I want to see the error message being helpful in recovering from the problem reported.

Design

high level design

Approach

There are the major work items involved:

  1. Error Handling: Error must be reported very clearly to the User in a way that helps them recover from the problem. Work items: 
    1. Errors from AM are sent to stderr/stdout. This is probably because the bridge is not set up correctly. These errors must come through the logback framework so that they can be logged at the correct level with appropriate details. Ideally no errors should go to stdout/stderr. This involves making sure the SLF4J bridge jars are included in the job jars. 
    2. This also happens for two other packages: jetty and kafka producer. 
    3. Exceptions are wrapped several times over multiple layers. Logging these exceptions creates very long stack trace output. Work Items: 
      1.  CDAP will only log the root cause exception as error.
      2. The above is easier for the logs that are produced from CDAP. Eg Standalone. Error Logs generated from Hadoop system sometimes stringify the stack trace and there is not much we can do there. [in a clean way]
  2. Context Based Logging: Logs today are tagged with a logging context that contains program run id details. In addition to these tags, more context MDC tags will be added which can be used from the UI to filter logs that user would be interested in.
    1. Lifecycle: Logs that represent the lifecycle of a program or a pipeline.
    2. Error: In case of failures, the most interesting errors for a user must be tagged
    3. Other interesting information can be tagged using specific tags. 
  3. Program Logs Cleanup: This involves an overall cleanup of Program Logs. Today the logs are flooded with developer debug logs which are of the least interest to the User. Also there are several errors/warnings from Hadoop system often because of missing/incorrect user configuration. These needs to be cleaned up and the logs have to be written in a way that they target the User and not the developer.
  4. Guidelines for Future Development: Formulate a set of guidelines for logs that will be followed for future development. 

 

Guidelines

The Guidelines for the logs and the log levels are discussed here

API changes

New Programmatic APIs

New Java APIs introduced (both user facing and internal)

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionResponse CodeResponse
No new REST APIs are planned   

 

     

Deprecated REST API

PathMethodDescription
No Deprecations planned  

CLI Impact or Changes

  • No CLI impact

UI Impact or Changes

  • Several UI changes are involved to improve the readability of the logs. These changes are being addressed separately and are not covered in this document.

Security Impact 

There is no impact on Authorization or other security items.

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results
   
   
   
   

Releases

Release 4.2.0

All the above work items will be addressed as part of release 4.2.0

Related Work

  • Work #1
  • Work #2
  • Work #3

 

Future work

  • No labels