Configuring output directories for pipelines
This article is posted on the CDAP Doc wiki and will be maintained here: Configuring output directories for pipelines.
Overview
This document provides best practises for configuring output directory for file based sink plugins (S3, GCS).
General Tips
Ensure the output paths are unique if there are multiple file based sinks in the same pipeline
Having same output path in a pipeline (ex: Two error collectors having same paths) will result in a pipeline failure with an error: “User class threw exception: org.apache.hadoop.mapredue.FileAlreadyExistsException: Output directory xxxxxx already exists.”