This page describes how to configure a pipeline to run with custom Dataproc cluster properties. Cluster properties are an advanced feature used to modify the underlying Hadoop configuration properties. See https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/cluster-properties for more information.
Setting custom cluster properties
In the Cloud Data Fusion UI, navigate to the detail page of the pipeline you would like to configure.
Click on the drop-down next to the Run button.
Set your desired cluster properties, prefixing all property names with system.profile.properties. For example, if you would like to set yarn:yarn.nodemanager.pmem-check-enabled to false, enter system.profile.properties.yarn:yarn.nodemanager.pmem-check-enabled as the Name and false as the Value.
Some of the more commonly modified properties include:
spark:spark.task.maxFailures
mapred:mapreduce.map.maxattempts
mapred:mapreduce.reduce.maxattempts
mapred:mapreduce.task.io.sort.mb
yarn:yarn.scheduler.minimum-allocation-mb