Error when stopping a Streaming Pipeline

Problem

Users commonly see an error when stopping a streaming pipeline with a confusing error message about a SocketTimeout or some other overly technical error.

Solution(s)

Wait for some time. The pipeline will eventually stop.

When a streaming pipeline is stopped, it will try to finish processing everything that is in-flight, in order to avoid potential data loss. If the pipeline is not able to keep up with incoming data, it can build up a large backlog of data, which will cause the stop operation to take a long time.

If you would like to kill the pipeline without waiting for it to stop gracefully, you can manually kill the running dataproc job. If you are using the dataproc provisioner, you can do this by deleting the cluster. If you are using the remote hadoop provisioner, you can do this by sshing to the master node and running ‘yarn application -kill’ with the corresponding application id.

Related articles