Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 3 Current »

This page describes how to configure a pipeline to run with custom Dataproc cluster properties. Cluster properties are an advanced feature used to modify the underlying Hadoop configuration properties. See https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties for more information.

Setting custom cluster properties

  1. In the Cloud Data Fusion UI, navigate to the detail page of the pipeline you would like to configure.

  2. Click on the drop-down next to the Run button.

  3. Set your desired cluster properties, prefixing all property names with system.profile.properties. For example, if you would like to set yarn:yarn.nodemanager.pmem-check-enabled to false, enter system.profile.properties.yarn:yarn.nodemanager.pmem-check-enabled as the Name and false as the Value.

Some of the more commonly modified properties include:

  • spark:spark.task.maxFailures

  • mapred:mapreduce.map.maxattempts

  • mapred:mapreduce.reduce.maxattempts

  • mapred:mapreduce.task.io.sort.mb

  • yarn:yarn.scheduler.minimum-allocation-mb

  • No labels