Failed to provision Dataproc cluster due to missing bucket

Problem

Pipeline runs fail after a few seconds with a log message complaining about a GCS bucket not found. All pipeline runs using the same profile fail with a message about the same bucket.

For example:

2020-05-14 15:10:41,852 - ERROR [provisioning-service-2:i.c.c.i.p.t.ProvisioningTask@151] - PROVISION task failed in REQUESTING_CREATE state for program run program_run:default.t.-SNAPSHOT.workflow.DataPipelineWorkflow.bc504d34-962f-11ea-b486-000000f78a44. com.google.api.gax.rpc.InvalidArgumentException: io.grpc.StatusRuntimeException: INVALID_ARGUMENT: Google Cloud Storage bucket does not exist '[bucket redacted]'.

Solution(s)

Manually create the missing bucket, or configure the profile to use a different staging bucket.

Dataproc will try to re-use the same staging bucket for all cluster creation requests if no staging bucket is given (the default behavior for CDF dataproc profiles). See https://cloud.google.com/dataproc/docs/concepts/configuring-clusters/staging-bucket for more information.

Â