Configuring Cloud SQL for MySQL with Data Fusion
This document shows you how to configure CloudSQL for MySQL with Cloud Data Fusion (CDF).
Before you begin
Make sure these prerequisites are in place before you follow the steps in this guide.
Ensure that you have the following roles in IAM for the service account,
service-<project-number>@gcp-sa-datafusion.iam.gserviceaccount.com
:Cloud SQL Admin
Cloud Data Fusion Admin
Cloud Data Fusion API Service Agent
Create a CloudSQL (MySQL) instance, test database, and user.
In the Google Cloud Console, navigate to APIs and Services.
Click Enable APIS and Services.
Search for Cloud SQL Admin API.
Choose the Cloud SQL Admin API. Click Enable.
If you are connecting to a Private Cloud SQL instance, please follow additional steps at Connect to private cloudsql-mysql instance from private Cloud Data Fusion instance to set up a proxy VM to establish the connection.
Instructions
Obtain the JDBC Driver JAR file by downloading the pre-built jar-with-driver-and-dependencies from the releases page.
Alternatively, you can build it using:
mvn -P jar-with-driver-and-dependencies clean package -DskipTests
to build the JAR instead of the command in README from the above git repo.
Go to Cloud Data Fusion Wrangler.
4. If this is the first time you are configuring CloudSQL for MySQL, click on the Add Connection button from the Wrangler screen and choose Database.
5. Click “Google Cloud SQL for MySQL.”
6. Upload the previously built JAR as illustrated, and click Next.
7. Click Finish to complete the upload.
8. Once the driver has been uploaded, you will see a green check mark indicating that your driver has been installed.
Cloud SQL instances with Private IP cannot currently be accessed using Wrangler. So the following instructions will not work for Private IP instances. However, Private IP Cloud SQL instances can still be accessed by creating a pipeline, which will run using Cloud Dataproc. See Accessing Cloud SQL in pipelines for instructions.
9. Click the Google Cloud SQL for MySQL to create a new connection. Once the connection modal opens, click on the Advanced link if present.
10. Enter your connection string as:
jdbc:mysql://google/<database>?cloudSqlInstance=<instance-connection-name>&socketFactory=com.google.cloud.sql.mysql.SocketFactory&useSSL=false
For example:
jdbc:mysql://google/mysql?cloudSqlInstance=cloud-data-fusion-demos:us-west1:mysql&socketFactory=com.google.cloud.sql.mysql.SocketFactory&useSSL=false
where <database> represents the database you created in the prerequisites section, and <instance-connection-name> refers to your instance connection name as displayed in the Overview tab of of the instance Details page.
11. Enter the username and the password you configured for this CloudSQL instance.
12. Click Test Connection to verify that the connection can successfully be established with the database.
13. Click Add Connection to complete the task.
Once you’ve completed all the steps, you will be able to click on the newly defined database connection and see the list of tables for that database.
Accessing CloudSQL in pipelines
Perform steps 1-6 in the Wrangler section above.
Open Pipeline Studio.
From the plugin palette on the left, drop the Cloud SQL source plugin to the canvas and click Properties.
Specify the plugin name as cloudsql-mysql.
Specify the connection string as below:
For example:
where <database> represents the database you created in the prerequisites section, and <instance-name> refers to you instance connection name as displayed in the overview tab of of the instance details page, e.g:
6. Enter the query that you would like to import data from as the Import Query.
7. Enter the username and password to use for the database. You can also use a secure macro for the password.
8. Click Get Schema to populate the schema of the plugin.
9. Configure the rest of the pipeline and deploy the pipeline.