Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Current »

If realtime pipelines aren’t configured properly, they’ll likely fail due to YARN killing containers as the RDD caching reaches the allocated memory limit.

Instructions

Below are the configurations that need to be applied during running a pipeline that contains a GCS sink.

Also make sure the number of executors is set correct. By default it’s set to 1.

  1. Set the engine config spark.streaming.blockInterval to 30000 (30 seconds). This configuration has to be applied when realtime pipeline has a GCS sink. This will reduce the number of part files created in GCS sink.

  2. Set a runtime argument system.resources.reserved.memory.override to 1024 to reserve 1 GB of memory overhead for Spark process to avoid YARN killing.

  • No labels