Cloud Data Fusion integrates with Stackdriver Logging which allows customer to collect and view their Cloud Data Fusion pipeline logs in Stackdriver. This is an optional feature which can be used if you use Stackdriver as a central log collection and viewer. Cloud Data Fusion’s Stackdriver logging also contains logs from various resources such as Yarn, HDFS etc from the Dataproc cluster which can be very useful in getting a better visibility into the lifecycle and resource usage of your pipeline on the Dataproc cluster for debugging and fine tuning. This document provide a step-by-step guide of how to view Cloud Data Fusion pipeline logs in Stackdriver.
Instructions
Creating an Instance with Stackdriver Logging
You must enable Stackdriver logging during Cloud Data Fusion instance creation. This can be done by selecting the Enable Stackdriver logging service
option under Advanced Options
.
Cloud Data Fusion Beta supports enabling/disabling Stackdriver logging only during instance creation. Once an instance is created Stackdriver logging option cannot be updated.
Running Pipeline and RunId
You can run your pipeline as usual. Every Cloud Data Fusion pipeline run is assigned an unique runId. You can find the runId of a pipeline run in Summary
section of the run.
In Summary page click on the Table
link.
Now click on the RunId
link of your run and the RunId will be shown to you and also copied to the clipboard.
Viewing Logs in Stackdriver
After you have found out the runId of the pipeline run you can view all the logs of the pipeline run in Stackdriver under Cloud Dataproc Cluster → cdap-<pipeline-name>-<runId>
.
You can also filter the logs through the filter options at top to just look at the datafusion-pipeline-logs or yarn-resourcemanager logs or other components logs. Additionally you can also filter the logs at various log levels.
Downloading Logs
To download logs from Stackdriver just click Download logs
option at the top.