...
Pipeline execution fails when the execution engine is set to Spark with errors in the pipeline logs of the form:
“Lost task x.y in stage z.xx”
...
Symptom(s)
...
Out of memory in spark executor
Wrangler Bug
Task getting killed
Solution(s)
Out of memory in spark executor
When the spark Spark executors go out of memory, JVM will spend a lot of time in GC pause that will result in a timeout due to which executors would terminate. Under these scenarios the logs will be the following:
Code Block |
---|
Lost task 0.0 in stage 14.0 (TID 16, cdap-mock2dwh2-3ededd25-5837-11ea-b33b-1ad7eaaa4723-w-8.c.vf-pt-ngbi-dev-gen-03.internal, executor 1): \ ExecutorLostFailure (executor 1 exited caused by one of the running tasks) \ Reason: Executor heartbeat timed out after 125847 ms |
Remediation:
Check the executor resource allocated for the pipeline, if . If it is too low (ex: 2 GB), increase it to a higher value (8GB8 GB).
If the executor memory is over 32GB 32 GB for Join/aggregation use - cases and the pipeline still fails, ensure that the join best practices are being followed.
...
Wrangler bug
Due to a wrangler bug, the pipeline configs get overwritten which will result in the following error - :
Code Block |
---|
Lost task 48.0 in stage 17.0 (TID 78, cdap-mock2dwh2-3ededd25-5837-11ea-b33b-1ad7eaaa4723-w-2.c.vf-pt-ngbi-dev-gen-03.internal, executor 9): io.cdap.cdap.api.plugin.InvalidPluginConfigException: \ Unable to create plugin config. |
Remediation:
Set the executor
vcore
to 1, for CDF versions below 6.1.3
...
/6.2.0.
Task killed
Spark framework will kill executor tasks that will result in the following log message:
Code Block |
---|
Lost task 78.0 in stage 17.0 (TID 112, cdap-mock2dwh2-3ededd25-5837-11ea-b33b-1ad7eaaa4723-w-6.c.vf-pt-ngbi-dev-gen-03.internal, executor 6): TaskKilled \ (Stage cancelled) |
Note: The TaskKilled
is not an actual error, this . This is a result of spark the Spark framework cancelling canceling the executors during shutdown. The root cause of the error will be from a different executor.