...
Set the engine config
spark.streaming.blockInterval
to30000
(30 seconds). This configuration has to be applied when a realtime pipeline has a GCS sink. This will reduce the number of part files created in GCS sink.Set a runtime argument
system.resources.reserved.memory.override
to1024
to reserve 1 GB of memory overhead for the Spark process to avoid YARN killing.
...