Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Pipeline execution fails when the execution engine is set to Spark with errors in the pipeline logs of the form:

“Lost task x.y in stage z.xx”

...

Symptom(s)

...

  • Out of memory in spark executor

  • Wrangler Bug

  • Task getting killed

Solution(s)

Out of memory in spark executor

When the spark Spark executors go out of memory, JVM will spend a lot of time in GC pause that will result in a timeout due to which executors would terminate. Under these scenarios the logs will be the following:

Code Block
Lost task 0.0 in stage 14.0 (TID 16, cdap-mock2dwh2-3ededd25-5837-11ea-b33b-1ad7eaaa4723-w-8.c.vf-pt-ngbi-dev-gen-03.internal, executor 1): \
ExecutorLostFailure (executor 1 exited caused by one of the running tasks) \
Reason: Executor heartbeat timed out after 125847 ms

Remediation:

  • Check the executor resource allocated for the pipeline, if . If it is too low (ex: 2 GB), increase it to a higher value (8GB8 GB).

  • If the executor memory is over 32GB 32 GB for Join/aggregation use - cases and the pipeline still fails, ensure that the join best practices are being followed.

...

Wrangler bug

Due to a wrangler bug, the pipeline configs get overwritten which will result in the following error - :

Code Block
Lost task 48.0 in stage 17.0 (TID 78, cdap-mock2dwh2-3ededd25-5837-11ea-b33b-1ad7eaaa4723-w-2.c.vf-pt-ngbi-dev-gen-03.internal, executor 9): io.cdap.cdap.api.plugin.InvalidPluginConfigException: \
Unable to create plugin config.

Remediation:

  • Set the executor vcore to 1, for CDF versions below 6.1.3

...

  • /6.2.0.

Task killed

Spark framework will kill executor tasks that will result in the following log message:

Code Block
Lost task 78.0 in stage 17.0 (TID 112, cdap-mock2dwh2-3ededd25-5837-11ea-b33b-1ad7eaaa4723-w-6.c.vf-pt-ngbi-dev-gen-03.internal, executor 6): TaskKilled \
(Stage cancelled)

Note: The TaskKilled is not an actual error, this . This is a result of spark the Spark framework cancelling canceling the executors during shutdown. The root cause of the error will be from a different executor.