Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Next »

Overview

This article documents recommended configurations for running pipelines against a static Dataproc cluster. As an additional note, please refer to this article on how to Run pipelines against existing Dataproc clusters

General Tips

  • Set the following configurations while creating a static Dataproc cluster to run pipelines.

    • yarn.nodemanager.delete.debug-delay-sec - This is the configuration to retain YARN logs. Recommended value 86400 (which is 1 day)

    • yarn.nodemanager.pmem-check-enabled - This configuration enables YARN to check for physical memory limit and kill containers if they go beyond physical memory. Recommended value false

    • yarn.nodemanager.vmem-check-enabled - This configuration enables YARN to check for virtual memory limit and kill containers if they go beyond physical memory. Recommended value false.

  • These configurations can be set by clicking on Add Cluster Property while creating the cluster from cloud console.

  • No labels