Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Written by Albert Shau

Problem

The runtime pod restarts frequently due to OutOfMemory issues. This can manifest itself as pipeline run failures, with some sort of exception when the pipeline tries to talk to the runtime service.

...

Code Block
java.io.IOException: Failed to send message for program run program_run:Altipal_DataLake.SQLSERVER_CARGA_MINUTOS.-SNAPSHOT.workflow.DataPipelineWorkflow.266d15ac-2bab-11ec-bdc4-42cf72c2cfe8 to https://[cdf-uri]:443/v3Internal/runtime/namespaces/[ns]/apps/[pipeline]/versions/-SNAPSHOT/workflows/DataPipelineWorkflow/runs/[runid]/topics/metrics8. Respond code: 502. Error: unknown error
	at io.cdap.cdap.internal.app.runtime.monitor.RuntimeClient.throwIfError(RuntimeClient.java:209) ~[na:na]
	at io.cdap.cdap.internal.app.runtime.monitor.RuntimeClient.sendMessages(RuntimeClient.java:115) ~[na:na]
	at io.cdap.cdap.internal.app.runtime.monitor.RuntimeClientService$TopicRelayer.processMessages(RuntimeClientService.java:234) ~[na:na]
	at io.cdap.cdap.internal.app.runtime.monitor.RuntimeClientService$TopicRelayer.publishMessages(RuntimeClientService.java:200) ~[na:na]
	at io.cdap.cdap.internal.app.runtime.monitor.RuntimeClientService.runTask(RuntimeClientService.java:103) ~[na:na]

The OutOfMemory issues are due in part to a build up of historical run information on the runtime pod. To verify that this is the case, ssh to the pod and check the size of the ldb directory:

...