Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

Problem

Customers are seeing issue when running pipelines.

2020-04-22 17:40:01,982 - WARN  [Executor task launch worker for task 1610:o.a.s.s.BlockManager@66] - Persisting block rdd_7_276 to disk instead.

2020-04-22 17:40:43,018 - WARN  [Executor task launch worker for task 1610:o.a.s.s.m.MemoryStore@66] - Not enough space to cache rdd_7_276 in memory! (computed 1528.4 MB so far)

Symptom(s)

  • The pipelines will keep running for a long time and seems like they never finish.

  • Pipelines metrics keep resetting indicating that the jobs are reprocessing.

  • Logs indicate that Spark is not able to fit RDD in memory

  • False message that RDD is being persisted to disk.

Solution(s)

Turning off Auto-Caching

By default, pipelines will cache intermediate data in the pipeline in order to prevent Spark from re-computing data. This requires a substantial amount of memory, so pipelines that process a large amount of data will often need to turn this off.

  1. Navigate to the pipeline detail page

  2. In the Configure menu, click on Engine config

  3. Enter 'spark.cdap.pipeline.autocache.enable' as the key, and 'false' as the value.

  • No labels