Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

A dataset already exist in Google BigQuery:121

project Id: vernal-seasdf-123456project1

dataset name: baby_names

table name: names_2014

namecount
Emma100
Oscar334
Peter223
Jay1123
Nicolas764

 

User pull the schema of the dataset:

InputsValue
project Id vernal-seasdf-123456
dataset namebaby_names

 

Output schema is as follows:

SchemaTypeNullableDescription
nameStringNonames of baby born in 2014
countIntegerNothe number of occurrences of the name

 

User run query agains dataset in BigQuery and pull the records:

Configuration is specified as follows

      ♦ project Id

         ♦ vernal-seasdf-123456

     ♦ query

        ♦  SELECT name, count FROM baby_names ORDER BY count DESC LIMIT 3

 

Out put is as follows

...

example1:

Wiki Markup
{
        "name": "BigQuery",
          "properties": {
            "referenceName": "bigquery",
            "projectId": "vernal-project1",
            "tempBuketPath": "gs://bucketName.datasetName/tableName",
            "jsonFilePath": "/path/to/jsonkeyfile",
            "InputTableId": "vernal-project1:babynames.names_2014",
            "outputSchema": "name:string,count:int"
          }
        }
      }

This source will read the vernal-project1:babynames.names_2014 table, download the whole table to gs://bucketName.datasetName/tableName, and then get the data from there.

 

example2:

Wiki Markup
{
        "name": "BigQuery",
          "properties": {
            "referenceName": "bigquery",
            "projectId": "vernal-project1",
            "tempBuketPath": "gs://bucketName.datasetName/tableName",
            "jsonFilePath": "/path/to/jsonkeyfile",
            "importQuery":"SELECT name as babyName, count as nameCount FROM [vernal-project1:babynames.names_2014] ORDER BY count DESC LIMIT 3",
            "InputTableId": "vernal-project1:babynames.blankTable",
            "outputSchema": "babyName:string,babyCount:int"
          }
        }
      }

Before running this source, user should create a blank table with schema : {babyname:string, babyCount:int}. File in r example, make this blank table in vernal-project1:babynames.

and the output of the source is as follows:

babyNamebabyCount
Jay1123
Nicolas764
Oscar334

Implementation Tips

  • What authorization roles are required by this plugin? 
    • An application default credential is required. Here is where to get such a credential.
  • I see a few additional config options on the query API. Are those configurable by the user?
    • Now what the user need to configure are project Id, credential path to the local private key, query string, time limit.
  • Create a simple batch source inside hydrator plugin with all dependencies needed.
  • Add an endpoint to run query against datasets in BigQuery.

 

Design

Inputstyperequireddefault
ProjectIdStringYes

 

CredentialsStringYes 
QueryStringYes 
Limit TimeInteger (min)No10
Limit SizeInteger (GB)No50

...