Now CDAP provides the interface for users to handle their datasets in BigQuery.

Use-case

Users want to integrate CDAP with their already stored dataset in Google BigQuery.

User Stories

As a User, I would like to run arbitrary queries synchronously against my datasets in BigQuery and pull those records into a hydrator pipeline.
As a User, I would like to store data from a Hydrator pipeline into a table (dataset) in BigQuery. If the table doesn't exist, it should be created.

Requirements

1. User is able to query their datasets stored in Google BigQuery.

2. User should specify the limit time for the querying.

3. User is able to specify the limit size of the dataset to query.

4. User is able to poll for the result.

5. User can list the query result history for a duration of time.

6. The schema is automatically pulled from the table.

7. User can pull the field names from the query.

Example

Following is a simple example showing how BigQuery Source would work.

A dataset already exist in Google BigQuery:121

...

project Id:

...

vernal-

...

seasdf-123456

dataset name: baby_names

name	count
Emma	100
Oscar	334
Peter	223
Jay	1123
Nicolas	764

User pull the schema of the dataset:

Inputs	Value
project Id	vernal-seasdf-123456
dataset name	baby_names

output schema:

name	String
count	Integer

Design

CDAP provides two type of operations on the dataset stored in BigQuery: Query and Poll Results.

...

Versions Compared

Old Version 13

New Version 14

Key

Use-case

Requirements

Example

A dataset already exist in Google BigQuery:121

User pull the schema of the dataset:

Design

Page Comparison

Versions Compared

Old Version 13

New Version 14

Key

Use-case

Requirements

Example

A dataset already exist in Google BigQuery:121

User pull the schema of the dataset:

Design