Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Overview

This article documents recommended configurations for running pipelines against a static Dataproc cluster. As an additional note, please refer to this article on how to Run pipelines against existing Dataproc clusters

General Tips

  • Set the following configurations while creating a static Dataproc cluster to run pipelines.

    • yarn.nodemanager.delete.debug-delay-sec - This is the configuration to retain YARN logs. Recommended value 86400 (which is 1 day)

    • yarn.nodemanager.pmem-check-enabled - This configuration enables YARN to check for physical memory limit and kill containers if they go beyond physical memory. Recommended value false

    • yarn.nodemanager.vmem-check-enabled - This configuration enables YARN to check for virtual memory limit and kill containers if they go beyond physical memory. Recommended value false.

  • These configurations can be set by clicking on Add Cluster Property while creating the cluster from cloud console.

Table of Contents