Overview

Often times there is need for stitching together a series of actions that allow a pipeline to achieve a specific use case. One of the action that has recently been requested is a Cloud BigQuery Execute Action. This action is responsible for running a Cloud BigQuery SQL, waiting for it to finish, and continue further with processing in a pipeline on success.

Use case

ELT use case - The raw data is loaded into a data warehouse staging tables. Once, the data is loaded into staging tables, the transforms are applied using SQL. A series SQL need to execute into order to prepare data for analytics. The SQL queries that specify the transformation have to be stitched together as part of the pipeline and need to be scheduled either in parallel or serial. Execution of series of SQL queries will transform the data from staging tables into final fact tables in the warehouse.

As an example:

Periodically fetch Omniture Click Data from SFTP
Load the raw data into Cloud BigQuery staging table
Apply a series of SQL queries to transform the data
Generate or update 5-6 auxiliary tables and update main fact table.

Requirements

User can specify Legacy SQL or Non-Legacy SQL to execute
User can set the priority for the query
User can specify the mode - interactive or batch mode for the query execution
User can specify a non-temporary (permanent) table in which the results can be written
User can specify the query cache to be used
User can specify different retry options in case of failures
User can specify whether to continue further in case of failure
User can use the plugin in a non-GCP environment
Plugin passes the query and job id to the next node in the DAG

Limitations

No records are retrieved and/or stored.