Overview
Often times there is need for stitching together a series of actions that allow a pipeline to achieve a specific use case. One of the action that has recently been requested is a Cloud BigQuery Execute Action. This action is responsible for running a Cloud BigQuery SQL, waiting for it to finish, and continue further with processing in a pipeline on success.
Use case
ELT use case - The raw data is loaded into a data warehouse staging tables. Once, the data is loaded into staging tables, the transforms are applied using SQL. A series SQL need to execute into order to prepare data for analytics. The SQL queries that specify the transformation have to be stitched together as part of the pipeline and need to be scheduled either in parallel or serial. Execution of series of SQL queries will transform the data from staging tables into final fact tables in the warehouse.
As an example:
- Periodically fetch Omniture Click Data from SFTP
- Load the raw data into Cloud BigQuery staging table
- Apply a series of SQL queries to transform the data
- Generate or update 5-6 auxiliary tables and update main fact table.
Requirements
- User can specify Legacy SQL or Non-Legacy SQL to execute
- User can set the priority for the query
- User can specify the mode - interactive or batch mode for the query execution
- User can specify a non-temporary (permanent) table in which the results can be written
- User can specify the query cache to be used
- User can specify different retry options in case of failures
- User can specify whether to continue further in case of failure
- User can use the plugin in a non-GCP environment
- User can specify encryption with KMS Key
- Plugin passes the query and job id to the next node in the DAG
Limitations
- No records are retrieved and/or stored.
Design
- Plugin type : Action
- Configurations
- Project Id
- Service Account File
- Legacy SQL
- SQL
- Model - Batch or Interactive
- Cache Enabled/Disabled
- KMS Key Name
- Max Billing Tier
- Non-Temporary Dataset
- Non-Tempoary Table