Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Introduction 

...

  1. Logs will be pushed from the CDAP-controlled JVM on the Dataproc cluster to Stackdriver.

  2. Logs will be rendered to the CDAP UI by a CDAP service.

Approaches

...

Ingestion into Stackdriver

...

  1. Package the google-cloud-logging-logback jar file with the dataproc runtime extension module in CDAP.

  2. Copy the jar to the dataproc cluster and have it in the classpath of the JVM that we launch.

  3. Configure the logback .xml of the JVM that we launch to use the Stackdriver log appender. This can be done programmatically, similar to how it is done in LogAppenderInitializer.

  4. Implement a LoggingEnhancer to add labels for the logs that we emit. This may be useful when querying the logs. This may not be necessary if the google cloud log querying can filter based upon MDC.

Viewing the logs

There are a couple of approaches for viewing the logs.:

Approach #1: Use Client Java Library

Use the Stackdriver Logging Client libraries to fetch the logs from Stackdriver from a CDAP service.


Approach #2: Use Stackdriver REST API

Use the Stackdriver REST API to fetch the logs from Stackdriver within a CDAP service.

...

  • More lines of code than using the Java library


Approach #3: Have Stackdriver export the logs to Cloud Storage, BigQuery, or Cloud Pub/Sub

Use Stackdriver’s Logs Export to have logs published to Cloud Storage, BigQuery, or Cloud Pub/Sub. In the case of Cloud Storage,

...

  • More control over retention of logs

Cons:

  • Responsibility of retention now belongs to CDAP

  • More expensive, in the case that logs are not viewed often. Storage costs


Approach #4: View the logs from Stackdriver UI

Use the Stackdriver’s UI to view the logs directly.

...

  • Not natively integrated in CDAP UI; would mean that user leaves CDAP UI in order to view the logs


Open Questions 

  1. How will the CDAP system map the CDAP program’s run ID to a Stackdriver query?
    1. Profiles can currently be deleted, whereas viewing logs for a program run should still work.
    2. If logs have TTL'd in stackdriver or if profile has been deleted, what do we show in the UI? REST API?
  2. How will logs emitted by the provisioner be consolidated?
  3. There are metrics emitted about program logs when processing the program logs, such as number of errors. With the stackdriver integration, there is no longer a process emitting such metrics. How will we emit such metrics? One possible way is to have a log appender that emits these metrics from each container. Need to consider the performance impact of this.
  4. How can we keep the implementation generic enough to also support other logging integrations?


API changes

New Programmatic APIs

...

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionRequest BodyResponse CodeResponse






Deprecated REST API

PathMethodDescription



CLI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

...

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results












Releases

Release X.Y.Z

Release X.Y.Z

...