Written by Albert Shau
Problem
The runtime pod restarts frequently due to OutOfMemory issues. This can manifest itself as pipeline run failures, with some sort of exception when the pipeline tries to talk to the runtime service.
...
Code Block |
---|
java.io.IOException: Failed to send message for program run program_run:Altipal_DataLake.SQLSERVER_CARGA_MINUTOS.-SNAPSHOT.workflow.DataPipelineWorkflow.266d15ac-2bab-11ec-bdc4-42cf72c2cfe8 to https://[cdf-uri]:443/v3Internal/runtime/namespaces/[ns]/apps/[pipeline]/versions/-SNAPSHOT/workflows/DataPipelineWorkflow/runs/[runid]/topics/metrics8. Respond code: 502. Error: unknown error
at io.cdap.cdap.internal.app.runtime.monitor.RuntimeClient.throwIfError(RuntimeClient.java:209) ~[na:na]
at io.cdap.cdap.internal.app.runtime.monitor.RuntimeClient.sendMessages(RuntimeClient.java:115) ~[na:na]
at io.cdap.cdap.internal.app.runtime.monitor.RuntimeClientService$TopicRelayer.processMessages(RuntimeClientService.java:234) ~[na:na]
at io.cdap.cdap.internal.app.runtime.monitor.RuntimeClientService$TopicRelayer.publishMessages(RuntimeClientService.java:200) ~[na:na]
at io.cdap.cdap.internal.app.runtime.monitor.RuntimeClientService.runTask(RuntimeClientService.java:103) ~[na:na]
|
The OutOfMemory issues are due in part to a build up of historical run information on the runtime pod. To verify that this is the case, ssh to the pod and check the size of the ldb directory:
...