Page Comparison

This document provides instructions for configuring Wrangler and Pipelines in Cloud Data Fusion (CDF) to read from Amazon S3. Please ensure that you’ve met the prerequisites in order to follow the instructions presented in this document.

Prerequisites

There are a number of prerequisites that you need to ensure are in place before you can proceed with the configuration instructions. The following steps need to be performed on your Google Cloud Console.

...

Before you begin

Before you begin the other sections, follow these steps.

In the Google Cloud Console, create a Data Fusion instance. Follow the instructions here and ensure that they include the following IAM roles in IAM for the service account:
- Cloud Data Fusion Admin
- Cloud Data Fusion API Service Agent
- Storage Object Viewer
For your existing Amazon S3 bucket, make sure you know what region it is 's in. Determine whether that region accepts requests in both versions or only version 4of 4 of the Signature protocol. You can find this information on Amazon’s website at https://docs.aws.amazon.com/general/latest/gr/rande.html#s3_region.
If the region accepts version 4 only, note down one of the valid endpoint names.

Instructions

...

Configuring Cloud Data Fusion to read from Amazon S3

In CDFthe Cloud Data Fusion UI, go to Wrangler.

...

32. If this is the first time you are configuring S3 as a source, in the Wrangler page, click on the Add Connection button from the Wrangler screen and choose S3

...

. Choose S3.

...

.

43. Enter the information for this connection. Note that the credentials in this screenshot are made up and will not work. You need to use your own AWS access key.

...

7. You will now see the connection in your the left hand panel on the screen.

...

Transforming your data

Choose a bucket and a file within that bucket

...

. Note that even though you specified a region when creating the connection, Amazon S3

...

lists buckets from all regions that you have access to.

...

Choose a bucket that’s in the same region as the connection.

...

9. Once you are done wrangling2. Transform your data. When you’re done, click Create a Pipeline and select Batch:.

...

This creates a pipeline and takes you the Pipeline Studio.

103. If the S3 bucket is in a region that accepts version 4 of the Signature protocol only, then:

...

b) In the dialog window that pops upopens, change the schema of S3 Path to s3a:.

...

c) Scroll down and add the following under File System Properties:

...

For example:

...

d) Close the pop-up window.

114. Finish, deploy and run your pipeline.

Note
Note: Pipeline preview will not currently work with Amazon S3.

Versions Compared

Old Version 1

New Version 2

Key

Prerequisites

Before you begin

Instructions

Configuring Cloud Data Fusion to read from Amazon S3

...

Transforming your data

Related articles