Logging integration
Introduction
When running CDAP programs on Google Dataproc, anything logged by CDAP code on the Dataproc cluster is written only to files on the Dataproc cluster. This makes viewing the logs difficult, as a user must setup ssh access to the node(s) of the Dataproc cluster, remotely log in, identify where the log files are on the file system, and then finally view them.
In order to make it easier for users to view logs from the Dataproc cluster, we will leverage Google Stackdriver. There will be two aspects to the solution:
Logs will be pushed from the CDAP-controlled JVM on the Dataproc cluster to Stackdriver.
Logs will be rendered to the CDAP UI by a CDAP service.
Approaches
Ingestion into Stackdriver
We will use a Logback appender for Stackdriver. This will involve:
Package the google-cloud-logging-logback jar file with the dataproc runtime extension module in CDAP.
Copy the jar to the dataproc cluster and have it in the classpath of the JVM that we launch.
Configure the logback of the JVM that we launch to use the Stackdriver log appender. This can be done programmatically, similar to how it is done in LogAppenderInitializer.
Implement a LoggingEnhancer to add labels for the logs that we emit. This may be useful when querying the logs. This may not be necessary if the google cloud log querying can filter based upon MDC.
Viewing the logs
There are a couple of approaches for viewing the logs:
Approach #1: Use Client Java Library
Use the Stackdriver Logging Client libraries to fetch the logs from Stackdriver from a CDAP service.
Approach #2: Use Stackdriver REST API
Use the Stackdriver REST API to fetch the logs from Stackdriver within a CDAP service.
Pros:
- More flexible than the Java library (programmatic library may be missing some functionality)
Cons:
More lines of code than using the Java library
Approach #3: Have Stackdriver export the logs to Cloud Storage, BigQuery, or Cloud Pub/Sub
Use Stackdriver’s Logs Export to have logs published to Cloud Storage, BigQuery, or Cloud Pub/Sub. In the case of Cloud Storage,
Pros:
More control over retention of logs
Cons:
Responsibility of retention now belongs to CDAP
More expensive, in the case that logs are not viewed often. Storage costs
Approach #4: View the logs from Stackdriver UI
Use the Stackdriver’s UI to view the logs directly.
Pros:
Avoid reimplementing functionality of a logs UI, such as filtering by timestamp, filtering by log level, search by text, as well as having an advanced filter syntax
Not natively integrated in CDAP UI; would mean that user leaves CDAP UI in order to view the logs
Open Questions
- How will the CDAP system map the CDAP program’s run ID to a Stackdriver query?
- Profiles can currently be deleted, whereas viewing logs for a program run should still work.
- If logs have TTL'd in stackdriver or if profile has been deleted, what do we show in the UI? REST API?
- How will logs emitted by the provisioner be consolidated?
- There are metrics emitted about program logs when processing the program logs, such as number of errors. With the stackdriver integration, there is no longer a process emitting such metrics. How will we emit such metrics? One possible way is to have a log appender that emits these metrics from each container. Need to consider the performance impact of this.
- How can we keep the implementation generic enough to also support other logging integrations?
API changes
New Programmatic APIs
New Java APIs introduced (both user facing and internal)
Deprecated Programmatic APIs
New REST APIs
Path | Method | Description | Request Body | Response Code | Response |
---|---|---|---|---|---|
Deprecated REST API
Path | Method | Description |
---|---|---|
CLI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
UI Impact or Changes
- Impact #1
- Impact #2
- Impact #3
Security Impact
What's the impact on Authorization and how does the design take care of this aspect
Impact on Infrastructure Outages
System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect
Test Scenarios
Test ID | Test Description | Expected Results |
---|---|---|
Releases
Release X.Y.Z
Release X.Y.Z
Related Work
- Work #1
- Work #2
- Work #3