Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Overview

Often times there is need for stitching together a series of actions that allow a pipeline to achieve a specific use case. One of the action that has recently been requested is a Cloud BigQuery Execute Action. This action is responsible for running a Cloud BigQuery SQL, waiting for it to finish, and continue further with processing in a pipeline on success.

Use case

ELT use case - The raw data is loaded into a data warehouse staging tables. Once, the data is loaded into staging tables, the transforms are applied using SQL. A series SQL need to execute into order to prepare data for analytics. The SQL queries that specify the transformation have to be stitched together as part of the pipeline and need to be scheduled either in parallel or serial. Execution of series of SQL queries will transform the data from staging tables into final fact tables in the warehouse. 

As an example:

  • Periodically fetch Omniture Click Data from SFTP
  • Load the raw data into Cloud BigQuery staging table
  • Apply a series of SQL queries to transform the data
  • Generate or update 5-6 auxiliary tables and update main fact table.

Requirements

  1. User can specify Legacy SQL or Non-Legacy SQL to execute
  2. User can set the priority for the query
  3. User can specify the mode - interactive or batch mode for the query execution
  4. User can specify a non-temporary (permanent) table in which the results can be written
  5. User can specify the query cache to be used
  6. User can specify different retry options in case of failures
  7. User can specify whether to continue further in case of failure
  8. User can use the plugin in a non-GCP environment
  9. Plugin passes the query and job id to the next node in the DAG 

Limitations

  1. No records are retrieved and/or stored.




  • No labels