Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

WIP

Use Case 

...

Checklist

  •  User stories documented (Shankar)
  •  User stories reviewed (Nitin)
  •  Design documented (Shankar/Vinisha)
  •  Design reviewed (Terence/Andreas)
  •  Feature merged ()
  •  Examples and guides ()
  •  Integration tests () 
  •  Documentation for feature ()
  •  Blog post

...

Usecase

  • User wants to group log messages at application level and write multiple separate log files for each application. Example: application-dir/{audit.log, metrics.log, debug.log}
  • User wants to write these log files to a configurable path in HDFS.
  • User also wants to be able to configure rolling policy for these log files similar to log-back. 

User Stories

  1. For each application, user wants to  collect the

...

  1. application's logs into multiple logs files based on log level 

  2. For each application, user wants to configure a location in HDFS to be used to store the collected logs. 
  3. For each application, User wants the application log files stored in text format. 
  4. For each application, User would wants to configure the RotationPolicy of log files. 
  5. For each application, user wants to configure different layout for formatting log messages for the different log files generated.

Design

Introduce Log Processor, FileWriter and FileWriter RotationPolicy interfaces. Pluggable in CDAP LogsaverLog-saver

Programmatic API

...

 

Code Block
public interface LogProcessor {

  /**
   * Called during initialize, passed properties for log processor.
   *
   * @param properties
   */
  void initialize(Properties properties);

  /**
   * Process method will be called with iterator of log messages, log messages received will be in sorted order,
   * sorted by timestamp. This method should not throw any exception,exceptions. ifIf any unchecked exceptions are thrown,
   * log.saver will log an error and the processor will not receive messages.
   * Will start receiving messages on log.saver startup.
   * 
   * @param events list of {@link LogEvent}
   */
  void process(Iterator<LogEvent> events);

  /**
   * stop logprocessor
   */
  void destroy();
}
Code Block
class LogEvent {
  /**
   * Logging event
   */
  ILoggingEvent iLoggingEvent;
 
  /**
   * CDAP program entity-id
   */
  EntityId entityId;
}

 

Step2: FileWriter

Currently, we only have AvroFileWriter in Log.saver, ; we can create an interface for users to configure the FileWriter to provide if needed. Thisprovides the option to abstract certain common logic for file rotation, maintaining created files, etc into an AbstractFileWriter and . in Log saver and a custom file writer can implement the other methods , example : writing to HDFS text file, etc. specific to its logic,

Example: Creating files in HDFS and maintaining the size of events processed is maintained by custom FileWriter extension.

Code Block
public interface FileWriterMultiFileWriter {
  /**
   * get File manager for the log event. This file append(Iterator<ILoggingEvent> events)
  rotateFile(File file, EntityId entityId, long timestamp)
  getFile(EntityId entityId, long timestamp)
  close(File file, long timestamp)
  closeAndDelete(File file)
  flush()
}

Code Block
public abstract class AbstractFileWriter implements FileWriter {
	
	public File rotateFile(File file, EntityId entityId, long timestamp) {
  		// common-logic for rotating files
	}
	public getFile(EntityId entityId, long timestamp) {
		// common-logic for getting previously files
	}
}

 

Option-1

...

manager will be used to create, append-events, flush and close the file for the logging 
   * events of entityId (logging-context)
   */
   getFileManager(LogEvent event); 

}
Code Block
interface FileManager {
  /**
   * Based on the logEvent, get entityId and use that information to create the file.
   **/
  File createFile(LogEvent logEvent); 

 /**
  * append log events to the currently active file belonging to the entityId represented by these log events. 
  * Logic : on the first append, we determine if the file has to be rotated or not using the RotationPolicy#shoudRotateFile(File file, LogEvent 
  * logEvent). If it has to be rotated, we will use RotationPolicy#rotateFile(File file, LogEvent logEvent) to rotate the file (close the old  
  * file) and append to the new file
  **/
  void appendEvents(Iterator<LogEvent> logEvents);  
 
  /**
   * close the currently active file.
   **/
  void close();

  /**
   * flush the contents of the currently active file
   **/
  void flush();
}

 

 

Code Block
public interface RotationPolicy {
  /**
   * For the logEvent, decide if we should rotate the current file corresponding to this event or not.
   */
  boolean shouldRotateFile(File file, LogEvent logEvent);
 
  /**
   * For the logEvent, rotate the log file based on rotation logic and return the newly created File.
   */
  File rotateFile(File file, LogEvent logEvent);
 
  /**
   * For the logEvent, get the currently active file used for appending the log events.
   */ 		
  File getActiveFile(LogEvent logEvent);
}

 

Approach

Option-1

Log Processor/File Writer Extensions run in the same container as log.saver. 

...

  • we can Stop the plugin (or) 
  • we can log an error and continue and stop the plugin after an error threshold.

4) FileWriterExtension will be used for file system operations (create, append, close) and RotationPolicyExtension will be used for deciding when to rotate the file.

5) stop the plugin log processor when log.saver stops.

 

Class-Loading Isolation

1) Should the log processor plugins have separate class-loaders (or) can they share the same ClassLoader as the log.saver system. 

     Having isolation helps with plugins to depend on different libraries, but should we allow them ? 

22)  If we use same Class-loader as log.saver, dependencies of extensions can be added to the classpath, and the classes available in log.saver system (hadoop, proto, ec) can be filtered out from the extension, so we use the classes provided by log.saver.

3) However if there are multiple log.processor extensions, say one for writing to s3 and another for writing to splunk, the classes from their dependencies could  potentially conflict with each other if we use the system class-loader ?

4) If we create separate Class loader for each extension to provide class loader isolation - we need to expose the following 

  • cdap-watchdog-api
  • cdap-proto
  • hadoop
  • logback-classic ( we need ILoggingEvent)
  • should we expose more classes ? What if user wants to write to a kafka server or third-party storage s3 on the log.processor logic ? Having separate class loader will help in these scenarios. 

Sample Custom Log Plugin Implementation 

1) Log Processor would want to process the ILoggingEvent, format it to a log message string (maybe using log-back layout classes) and write it to a destination.

...

  • as there can only be one logback.xml in a JVM and the logback is already configured for the log.saver container.
  • logback doesn't existing implementation for writing to HDFS. 

3) the configuration for logging location (base directory in hdfs) and logging class to use (SizeBasedRolling, etc) The properties required for extensions could be provided through cdap-site.xml for the extensions. These properties would be passed to the extension during initialize.

4) Log processor extension could provide an implementation of FileWriter interface (or extension of AbstractFileWriter) and RotationPolicy interfaces for HDFSFileWriter logic for the events it has processed using received from LogProcessor. 

4) Future implementation for other policies have to be implemented at the end of extensions and can be configured through cdap-site.xml

...

1) As number of extensions increases and if a processor extension is slow, this could cause performance of log.saver to drop, which will affect the CDAP log.saver performance

 

Option-2 (or)

...

Improvement 

 

Configure and Run a separate container for every log.processor plugin. 

...