Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Table of Contents

Introduction

Google provides BigQuery for querying massive datasets by enabling super-fast SQL queries against append-only tables using the processing power of Googles's infrastructure. Users can move their data into BigQuery and let it to handle the hard work. 

 

Now CDAP provides the interface for users to handle their datasets in BigQuery. 

...

Users want to integrate CDAP with their already stored dataset in Google BigQuery. 

 

User Stories

  1. As a user, I would like to run arbitrary queries synchronously against my datasets in BigQuery and pull those records in BigQuery and pull those records in a hydrator pipeline.

  2. As a user, i would like to store data from a Hydrator pipeline into a table (dataset) in BigQuery. If the table doesn't exist, it should be created.

 

Requirements

 

Requirements

  1. User should provide the correct project id which he has access to. 
  2.   

1. User should specify the limit time for the querying. 

...

Following is a simple example showing how BigQuery Source would work.

 

A dataset already exist in Google BigQuery:121

...

InputsValue
project Id vernal-seasdf-123456
dataset namebaby_names

 

output Output schema is as follows:

SchemaTypeRequiredDescription
nameStringYesnames of baby born in 2014
countIntegerYesthe number of occurrences of the name

...

User run query agains dataset in BigQuery and pull the records:

...

Configuration is specified as follows

      ♦ project Id

...

         ♦ vernal-seasdf-123456

     ♦ query

        ♦  SELECT name, count FROM baby_names ORDER BY count DESC LIMIT 3

 output:

Out put is as follows

namecount
Jay1123
Nicolas764
Oscar334

...

Inputstyperequireddefault
ProjectIdStringYes

 

CredencialStringYes 
QueryStringYes 
Limit TimeInteger (min)No10
Limit SizeInteger (GB)No50

 

Poll Results:

Using jobId:

InputstypeRequired
PorjectIdStringYes
JobIdStringYes

 

Polling Latest Results:

InputsTypeRequired
ProjectIdStringYes
Poll NumerIntegerYes