Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Overview

This article provides best practise for configuring pipeline resources for Spark pipelines

General Tips

  • Driver resources - Set the driver memory to 8 GB

    • An increased driver memory is needed for pipelines with large number of nodes (20+). If the driver memory is not set high then it results in a driver crash and leads to the following error “Malformed reply from SOCKS server”

  • Driver resources - Set the vcores to 1

  • Executor resource - Set vcores to 1 in all CDF releases upto 6.1.2

  • Executor resources - Set executor memory to minimum value of 4GB (4096 MB)

    • For use-cases that involve join and aggregation that has high cardinality, this configuration should be increased

  • No labels