Running pipelines with a custom service account

This page describes how to run a pipeline with a custom service account. There are two service accounts used when a pipeline is run – a creator service account and a runner service account. The creator service account is used to create the dataproc cluster. The runner service account is used on the Dataproc cluster to execute the pipeline. By default, the creator service account is the data fusion service account. By default, the runner service account is the default compute service account.

Most setups will not need to modify the creator service account. It is much more common to modify the runner service account to be a custom one rather than the default compute service account.

See https://cloud.google.com/data-fusion/docs/how-to/granting-service-account-permission?hl=en#grant_service_account_user_permission for more information on the permissions required on the service accounts.

Configuring the runner service account

  1. In the Pipeline Studio, navigate to the System Admin page using the top nav bar.

  2. Click on the Configuration tab, and then click on Create New Profile in the System Compute Profiles section.

  3. Click on Google Cloud Dataproc.

  4. Enter a Profile label and a Description.

  5. Enter the Service Account.

    Note that this is to configure the service account that will be used on the Dataproc cluster to execute the pipeline. It is different than the service account used to create the cluster, which is specified in in the Service Account Key field.

  6. Click on the Create at the bottom of the page.

  7. Navigate to the pipeline detail page for the pipeline you would like to run.

  8. Click on the Configure button, select the profile you just created, and then click Save.

     

Â