Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Streaming pipeline keeps on running for a long time after the ‘Stop’ Stop button is clicked.

  • Logs indicate task failures and RDDs are not found.

  • The Spark Streaming UI indicates that there are many active batches, and batch processing time is greater than the configured batch interval.

    • Navigate to Spark Streaming UI from Dataproc > Clusters > Web Interfaces > YARN ResourceManager > ApplicationMaster under Tracking UI > Streaming.

...

Image Modified

Solution(s)

Increase batch interval

Set batch interval to a value that’s greater than the batch processing time. You can see the batch processing time in the Spark Streaming UI, as explained above.

...

Info

The pipeline should be cloned before making configuration changes.

Configuration changes will not take effect if you edit the configuration of a deployed pipeline. Upon restart,

...

Spark will read its configuration from checkpoint, so any config changes made in the UI will not take effect.

Use Pub/Sub instead of GCS sink

...