Not enough space to cache rdd

Problem

Customers are seeing issue when running pipelines.

2020-04-22 17:40:01,982 - WARN [Executor task launch worker for task 1610:o.a.s.s.BlockManager@66] - Persisting block rdd_7_276 to disk instead.
2020-04-22 17:40:43,018 - WARN [Executor task launch worker for task 1610:o.a.s.s.m.MemoryStore@66] - Not enough space to cache rdd_7_276 in memory! (computed 1528.4 MB so far)

Symptom(s)

The pipelines will keep running for a long time and seems like they never finish.
Pipelines metrics keep resetting indicating that the jobs are reprocessing.
Logs indicate that Spark is not able to fit RDD in memory
False message that RDD is being persisted to disk.

Solution(s)

Turning off Auto-Caching

By default, pipelines will cache intermediate data in the pipeline in order to prevent Spark from re-computing data. This requires a substantial amount of memory, so pipelines that process a large amount of data will often need to turn this off.

Navigate to the pipeline detail page
In the Configure menu, click on Engine config
Enter 'spark.cdap.pipeline.autocache.enable' as the key, and 'false' as the value.

Not enough space to cache rdd

Problem

Symptom(s)

Solution(s)

Turning off Auto-Caching

Related articles