Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

Version 1 Next »

Realtime pipeline if not configured properly will fail due to YARN killing containers as the RDD caching reaches the allocated memory limit.

Instructions

Below is the configurations that need to be applied during running a pipeline that contains a GCS sink.

Also make sure the number of executors is set correct. By default it’s set to 1.

  1. Set the engine config spark.streaming.blockInterval to 30000 (30 seconds). This configuration has to be applied when realtime pipeline has a a GCS sink. This will reduce the number of part files created in GCS sink.

  2. Set a runtime argument system.resources.reserved.memory.override to 1024 to reserve 1GB of memory overhead for Spark process to avoid YARN killing.

  • No labels