Introduction
...
Package the google-cloud-logging-logback jar file with the dataproc runtime extension module in CDAP.
Copy the jar to the dataproc cluster and have it in the classpath of the JVM that we launch.
Configure the logback .xml of the JVM that we launch to use the Stackdriver log appender. This can be done programmatically, similar to how it is done in LogAppenderInitializer.
Implement a LoggingEnhancer to add labels for the logs that we emit. This may be useful when querying the logs. This may not be necessary if the google cloud log querying can filter based upon MDC.
Viewing the logs
There are a couple of approaches for viewing the logs:
...
Not natively integrated in CDAP UI; would mean that user leaves CDAP UI in order to view the logs
Open Questions
- How will the CDAP system map the CDAP program’s run ID to a Stackdriver query?
- Profiles can currently be deleted, whereas viewing logs for a program run should still work.
- If logs have TTL'd in stackdriver or if profile has been deleted, what do we show in the UI? REST API?
- How will logs emitted by the provisioner be consolidated?
- There are metrics emitted about program logs when processing the program logs, such as number of errors. With the stackdriver integration, there is no longer a process emitting such metrics. How will we emit such metrics? One possible way is to have a log appender that emits these metrics from each container. Need to consider the performance impact of this.
- How can we keep the implementation generic enough to also support other logging integrations?
API changes
New Programmatic APIs
...