Setting custom Dataproc cluster properties

This page describes how to configure a pipeline to run with custom Dataproc cluster properties. Cluster properties are an advanced feature used to modify the underlying Hadoop configuration properties. See https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties for more information.

Setting custom cluster properties

  1. In the Pipeline Studio, navigate to the detail page of the pipeline you would like to configure.

  2. Click on the drop-down next to the Run button.

  3. Set your desired cluster properties, prefixing all property names with system.profile.properties. For example, if you would like to set yarn:yarn.nodemanager.pmem-check-enabled to false, enter system.profile.properties.yarn:yarn.nodemanager.pmem-check-enabled as the Name and false as the Value.

     

Some of the more commonly modified properties include:

  • spark:spark.task.maxFailures

  • mapred:mapreduce.map.maxattempts

  • mapred:mapreduce.reduce.maxattempts

  • mapred:mapreduce.task.io.sort.mb

  • yarn:yarn.scheduler.minimum-allocation-mb

Â