Versions Compared
compared with
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Overview
This article provides best practice for configuring pipeline resources for Spark pipelines.
General Tips
Driver resources - Set the driver memory to 8 GB
An increased driver memory is needed for pipelines with large number of nodes (20+). If the driver memory is not set high, then it results in a driver crash and leads to the following error “Malformed reply from SOCKS server”.
Driver resources - Set the vcores to 1.
Executor resource - Set vcores to 1 in all CDF releases up to 6.1.2.
Executor resources - Set executor memory to minimum value of 4 GB (4096 MB).
For use-cases that involve join and aggregation that has high cardinality, this configuration should be increased.
Table of Contents |
---|