Give service accounts access to GCP resources

Data Fusion by default has access to read and write to Big Query/GCS/Pub-Sub/Spanner/BigTable on the project where the Data Fusion instance is created. If users would like to access other GCP resources or any of above mentioned GCP resources in a different project then they would need to follow the instructions below.

Before you begin

Create a Data Fusion instance

Doing a task

Data Fusion uses service account to access GCP resources in wrangler, preview and for pipelines running on Dataproc. The service account used for running services in the tenant project such as preview, wrangler is in the following format service-<customer-project-number>@gcp-sa-datafusion.iam.gserviceaccount.com. This service account is already created when Cloud Data Fusion API is enabled on the project. Actual pipeline execution on the Dataproc cluster happens using compute engine default service account. Any additional GCP resources that Data Fusion needs access should have appropriate permissions for both of these service account.

For example, to add access to Datastore from preview and wrangler follow the steps below.:

In the GCP Console, open the IAM & Admin page.
In the left bar click IAM
Edit roles for service-<some_number>@gcp-sa-datafusion.iam.gserviceaccount.com
In Edit permissions page, add role Cloud Datastore Owner
Click on Save

Perform similar steps for the compute engine default service account to allow pipeline to access Datastore during the its execution on Dataproc.