Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Introduction 

When running CDAP programs on Google Dataproc, anything logged by CDAP code on the Dataproc cluster is written only to files on the Dataproc cluster. This makes viewing the logs difficult, as a user must setup ssh access to the node(s) of the Dataproc cluster, remotely log in, identify where the log files are on the file system, and then finally view them.

In order to make it easier for users to view logs from the Dataproc cluster, we will leverage Google Stackdriver. There will be two aspects to the solution:

  1. Logs will be pushed from the CDAP-controlled JVM on the Dataproc cluster to Stackdriver.

  2. Logs will be rendered to the CDAP UI by a CDAP service.

Approaches

Approach #1

Ingestion into Stackdriver

We will use a Logback appender for Stackdriver. This will involve:

  1. Package the google-cloud-logging-logback jar file with the dataproc runtime extension module in CDAP.

  2. Copy the jar to the dataproc cluster and have it in the classpath of the JVM that we launch.

  3. Configure the logback.xml of the JVM that we launch to use the Stackdriver log appender.

  4. Implement a LoggingEnhancer to add labels for the logs that we emit. This may be useful when querying the logs.

Viewing the logs

There are a couple of approaches for viewing the logs.

Use Client Java Library

Use the Stackdriver Logging Client libraries to fetch the logs from Stackdriver from a CDAP service.


Use Stackdriver REST API

Use the Stackdriver REST API to fetch the logs from Stackdriver within a CDAP service.

Pros:

  • More flexible than the Java library (programmatic library may be missing some functionality)

Cons:

  • More lines of code than using the Java library


Have Stackdriver export the logs to Cloud Storage, BigQuery, or Cloud Pub/Sub

Use Stackdriver’s Logs Export to have logs published to Cloud Storage, BigQuery, or Cloud Pub/Sub. In the case of Cloud Storage,


Pros:


  • More control over retention of logs


Cons:


  • Responsibility of retention now belongs to CDAP

  • More expensive, in the case that logs are not viewed often. Storage costs


View the logs from Stackdriver UI

Use the Stackdriver’s UI to view the logs directly.

Pros:

  • Avoid reimplementing functionality of a logs UI, such as filtering by timestamp, filtering by log level, search by text, as well as having an advanced filter syntax

Cons:
  • Not natively integrated in CDAP UI; would mean that user leaves CDAP UI in order to view the logs


Open Questions 

  1. How will the CDAP system map the CDAP program’s run ID to a Stackdriver query?
    1. Profiles can currently be deleted, whereas viewing logs for a program run should still work.


API changes

New Programmatic APIs

New Java APIs introduced (both user facing and internal)

Deprecated Programmatic APIs

New REST APIs

PathMethodDescriptionRequest BodyResponse CodeResponse






Deprecated REST API

PathMethodDescription



CLI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

UI Impact or Changes

  • Impact #1
  • Impact #2
  • Impact #3

Security Impact 

What's the impact on Authorization and how does the design take care of this aspect

Impact on Infrastructure Outages 

System behavior (if applicable - document impact on downstream [ YARN, HBase etc ] component failures) and how does the design take care of these aspect

Test Scenarios

Test IDTest DescriptionExpected Results












Releases

Release X.Y.Z

Release X.Y.Z

Related Work

  • Work #1
  • Work #2
  • Work #3


Future work

  • No labels