Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Introduction
Salesforce allows various types of schema changes, currently BigQuery Batch Sink plugin has limited support for such changes. This page documents an approach to handle schema changes in BigQuery Batch Sink plugin, primary in conjunction with Salesforce Batch Source plugin.
Use case
Requirement | Use Case | Approach |
---|---|---|
Add new column | BigQuery schema: a, b, c New schema: a, b, c, d Write data for column d. | ALLOW_FIELD_ADDITION value can be set to SchemaUpdateOptions in JobConfigurationLoad, will always add nullable field. |
Remove existing column | BigQuery schema: a int, b int not null New schema: a int | BigQuery does not allow deleting existing columns, only full table reload can be used in this case. |
Change column mode to less restrictive | BigQuery schema: a int not null New schema: a int | ALLOW_FIELD_RELAXATION value can be set to SchemaUpdateOptions in JobConfigurationLoad. |
Change column mode to more restrictive | BigQuery schema: a int New schema: a int not null | BigQuery does not allow column mode change to more restrictive, only full table reload can be used in this case. |
Change column type | BigQuery schema: a int New schema: a double | BigQuery does not allow changing column type, only full table reload can be used in this case. |
Full table reload | Remove existing data, load new data, possibly change schema | WRITE_TRUNCATE value can be set to WriteDisposition in JobConfigurationLoad. This option will always overwrite the schema along with the existing data. |
Use custom default for missing columns | BigQuery schema: a int, b int New schema: a int Add b column with custom default. | BigQuery Schema does not support default notion for columns, null value is written if value for column is missing. This can be handled internally by BigQuery Batch Sink plugin during data transformation stage. |
Add custom default to new column for already existing data | BigQuery has data in columns a and b. New column c is added. Need to insert data into column c for already existing data. | BigQuery does not support default notion for columns. If new column is added, its value in existing rows will be null. In order to change it, only full table reload can be used. |
Managing BigQuery table schema - https://cloud.google.com/bigquery/docs/managing-table-schemas
JobConfigurationLoad - https://developers.google.com/resources/api-libraries/documentation/bigquery/v2/java/latest/com/google/api/services/bigquery/model/JobConfigurationLoad.html
Plugin Type
This is not a new plugin, but enhancements to existing plugin.
Design
Approach(s)
Properties
Add new section in BigQuery Batch Sink plugin properties named Schema Management.
User Facing Name | Type | Default | Description |
---|
Specifies the action that occurs if the destination table already exists.
Append: If the table already exists, BigQuery appends the data to the table.
Truncate: If the table already exists, BigQuery overwrites the table schema and data.
Allow Field Addition | boolean | false | Specifies the action that occurs if input field is missing in the destination table. |
False: Job fails if input data contains extra field. |
True: New nullable field is added to the schema if it is absent in destination table. | |||
Allow Field Relaxation | boolean | false | Allows relaxing a required field in the original schema to nullable. |
Default for Missing Fields | List of field names and its default value pairs | Empty | Allows to add default value for missing columns. User must specify field name and its default in String value pairs (default value will be implicitly converted to the field type specified in the schema). Example: "a" = "-1", "b" = "N/A". If default was not specified, null value will be written. |
Security
Limitation(s)
Future Work
- Some future work – HYDRATOR-99999
- Another future work – HYDRATOR-99999
Test Case(s)
- Test case #1
- Test case #2
Sample Pipeline
Please attach one or more sample pipeline(s) and associated data.
Pipeline #1
Pipeline #2
Table of Contents
Table of Contents style circle
Checklist
- User stories documented
- User stories reviewed
- Design documented
- Design reviewed
- Feature merged
- Examples and guides
- Integration tests
- Documentation for feature
- Short video demonstrating the feature