...

A dataset already exist in Google BigQuery:121

project Id: vernal-seasdf-123456project1

dataset name: baby_names

table name: names_2014

name	count
Emma	100
Oscar	334
Peter	223
Jay	1123
Nicolas	764

User pull the schema of the dataset:

Inputs	Value
project Id	vernal-seasdf-123456
dataset name	baby_names

Output schema is as follows:

Schema	Type	Nullable	Description
name	String	No	names of baby born in 2014
count	Integer	No	the number of occurrences of the name

User run query agains dataset in BigQuery and pull the records:

Configuration is specified as follows

♦ project Id

♦ vernal-seasdf-123456

♦ query

♦ SELECT name, count FROM baby_names ORDER BY count DESC LIMIT 3

Out put is as follows

...

example1:

Wiki Markup

{
        "name": "BigQuery",
          "properties": {
            "referenceName": "bigquery",
            "projectId": "vernal-project1",
            "tempBuketPath": "gs://bucketName.datasetName/tableName",
            "jsonFilePath": "/path/to/jsonkeyfile",
            "InputTableId": "vernal-project1:babynames.names_2014",
            "outputSchema": "name:string,count:int"
          }
        }
      }

This source will read the vernal-project1:babynames.names_2014 table, download the whole table to gs://bucketName.datasetName/tableName, and then get the data from there.

example2:

Wiki Markup

{
        "name": "BigQuery",
          "properties": {
            "referenceName": "bigquery",
            "projectId": "vernal-project1",
            "tempBuketPath": "gs://bucketName.datasetName/tableName",
            "jsonFilePath": "/path/to/jsonkeyfile",
            "importQuery":"SELECT name as babyName, count as nameCount FROM [vernal-project1:babynames.names_2014] ORDER BY count DESC LIMIT 3",
            "InputTableId": "vernal-project1:babynames.blankTable",
            "outputSchema": "babyName:string,babyCount:int"
          }
        }
      }

Before running this source, user should create a blank table with schema : {babyname:string, babyCount:int}. File in r example, make this blank table in vernal-project1:babynames.

and the output of the source is as follows:

babyName	babyCount
Jay	1123
Nicolas	764
Oscar	334

Implementation Tips

What authorization roles are required by this plugin?
- An application default credential is required. Here is where to get such a credential.
I see a few additional config options on the query API. Are those configurable by the user?
- Now what the user need to configure are project Id, credential path to the local private key, query string, time limit.
Create a simple batch source inside hydrator plugin with all dependencies needed.
Add an endpoint to run query against datasets in BigQuery.

Design

Inputs	type	required	default
ProjectId	String	Yes
Credentials	String	Yes
Query	String	Yes
Limit Time	Integer (min)	No	10
Limit Size	Integer (GB)	No	50

...

Versions Compared

Old Version 26

New Version 27

Key

User pull the schema of the dataset:

User run query agains dataset in BigQuery and pull the records:

example1:

example2:

Implementation Tips

Page Comparison

Versions Compared

Old Version 26

New Version 27

Key

User pull the schema of the dataset:

User run query agains dataset in BigQuery and pull the records:

example1:

example2:

Implementation Tips