Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

This article is posted on the CDAP Doc wiki and will be maintained here: https://cdap.atlassian.net/wiki/spaces/DOCS/pages/1162707151/Pipeline+resource+configurations

Overview

This article provides best practise practice for configuring pipeline resources for Spark pipelines.

General Tips

  • Driver resources - Set the driver memory to 8 GB

    • An increased driver memory is needed for pipelines with large number of nodes (20+). If the driver memory is not set high, then it results in a driver crash and leads to the following error “Malformed reply from SOCKS server”.

  • Driver resources - Set the vcores to 1.

  • Executor resource - Set vcores to 1 in all CDF releases upto up to 6.1.2.

  • Executor resources - Set executor memory to minimum value of 4GB 4 GB (4096 MB).

    • For use-cases that involve join and aggregation that has high cardinality, this configuration should be increased.

Table of Contents