Setting custom Dataproc cluster properties
This page describes how to configure a pipeline to run with custom Dataproc cluster properties. Cluster properties are an advanced feature used to modify the underlying Hadoop configuration properties. See https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties for more information.
Setting custom cluster properties
In the Pipeline Studio, navigate to the detail page of the pipeline you would like to configure.
Click on the drop-down next to the Run button.
Set your desired cluster properties, prefixing all property names with system.profile.properties. For example, if you would like to set yarn:yarn.nodemanager.pmem-check-enabled to false, enter system.profile.properties.yarn:yarn.nodemanager.pmem-check-enabled as the Name and false as the Value.
Â
Some of the more commonly modified properties include:
spark:spark.task.maxFailures
mapred:mapreduce.map.maxattempts
mapred:mapreduce.reduce.maxattempts
mapred:mapreduce.task.io.sort.mb
yarn:yarn.scheduler.minimum-allocation-mb
Â