Source plugin splits binary data on newline characters

Problem

Source plugin like File or GCS incorrectly splits the binary input data on newline characters, resulting in each input record getting split into multiple output records.

Solution(s)

  • Make sure input format is set to blob instead of text.

  • The output schema should be set to bytes.

Â