Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Overview

This article documents recommended configurations for running pipelines against a static Dataproc cluster. As an additional note, please refer to this article on how to Run pipelines against existing Dataproc clusters

General Tips

  • Set the following configurations while creating a static Dataproc cluster to run pipelines.

    • yarn.nodemanager.delete.debug-delay-sec - This is the configuration to retain yarn logs. Recommended value 86400 (which is 1 day)

    • yarn.nodemanager.pmem-check-enabled - This configuration enables yarn to check for physical memory limit and kill containers if they go beyond physical memory. Recommended value false

    • yarn.nodemanager.vmem-check-enabled - This configuration enables yarn to check for virtual memory limit and kill containers if they go beyond physical memory. Recommended value false

  • These configurations can be set by clicking on Add Cluster Property while creating the cluster from cloud console

  • No labels