...
A dataset already exist in Google BigQuery:121
project Id: vernal-seasdf-123456project1
dataset name: baby_names
table name: names_2014
name | count |
---|---|
Emma | 100 |
Oscar | 334 |
Peter | 223 |
Jay | 1123 |
Nicolas | 764 |
User pull the schema of the dataset:
Inputs | Value |
---|---|
project Id | vernal-seasdf-123456 |
dataset name | baby_names |
Output schema is as follows:
Schema | Type | Nullable | Description |
---|---|---|---|
name | String | No | names of baby born in 2014 |
count | Integer | No | the number of occurrences of the name |
User run query agains dataset in BigQuery and pull the records:
Configuration is specified as follows
♦ project Id
♦ vernal-seasdf-123456
♦ query
♦ SELECT name, count FROM baby_names ORDER BY count DESC LIMIT 3
Out put is as follows
...
example1:
Wiki Markup |
---|
{
"name": "BigQuery",
"properties": {
"referenceName": "bigquery",
"projectId": "vernal-project1",
"tempBuketPath": "gs://bucketName.datasetName/tableName",
"jsonFilePath": "/path/to/jsonkeyfile",
"InputTableId": "vernal-project1:babynames.names_2014",
"outputSchema": "name:string,count:int"
}
}
} |
This source will read the vernal-project1:babynames.names_2014 table, download the whole table to gs://bucketName.datasetName/tableName, and then get the data from there.
example2:
Wiki Markup |
---|
{
"name": "BigQuery",
"properties": {
"referenceName": "bigquery",
"projectId": "vernal-project1",
"tempBuketPath": "gs://bucketName.datasetName/tableName",
"jsonFilePath": "/path/to/jsonkeyfile",
"importQuery":"SELECT name as babyName, count as nameCount FROM [vernal-project1:babynames.names_2014] ORDER BY count DESC LIMIT 3",
"InputTableId": "vernal-project1:babynames.blankTable",
"outputSchema": "babyName:string,babyCount:int"
}
}
} |
Before running this source, user should create a blank table with schema : {babyname:string, babyCount:int}. File in r example, make this blank table in vernal-project1:babynames.
and the output of the source is as follows:
babyName | babyCount |
---|---|
Jay | 1123 |
Nicolas | 764 |
Oscar | 334 |
Implementation Tips
- What authorization roles are required by this plugin?
- An application default credential is required. Here is where to get such a credential.
- I see a few additional config options on the query API. Are those configurable by the user?
- Now what the user need to configure are project Id, credential path to the local private key, query string, time limit.
- Create a simple batch source inside hydrator plugin with all dependencies needed.
- Add an endpoint to run query against datasets in BigQuery.
Design
Inputs | type | required | default |
---|---|---|---|
ProjectId | String | Yes |
|
Credentials | String | Yes | |
Query | String | Yes | |
Limit Time | Integer (min) | No | 10 |
Limit Size | Integer (GB) | No | 50 |
...