Teradata database plugin
Introduction
A separate database plugin to support Teradata-specific features and configurations
Use-cases
- Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin)
- Users should get field level lineage for the source and sink that is being used
- Reference documentation should be updated to account for the changes
- The data pipeline using source and sink plugins should run on both MapReduce and Spark engines
- Users can choose and install Teradata source and sink plugins
- Users should see Teradata logo on plugin configuration page for better experience
- Users should get relevant information from the tooltip
- The tooltip for the connection string should be customized specifically to the Teradata database
- The tooltip should describe accurately what each field is used for
User Stories
Users should have each tool tip accurately describe what each field does
Users should know the format for the Teradata connection string by hovering over tooltip for connection string
Users should get field level lineage information for the Teradata source and sink
Users should be able to setup a pipeline avoiding specifying redundant information
Users should get updated reference document for Teradata source and sink
Users should be able to read all the DB types
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Design / Implementation Tips
Teradata JDBC Driver: https://downloads.teradata.com/download/connectivity/jdbc-driver
Teradata Express for VMware Player: https://downloads.teradata.com/download/database/teradata-express-for-vmware-player
Teradata data types: https://docs.teradata.com/reader/iRq_F~XxKYWu7Kv~HRd~ew/D_RBrANpKte9E5uvWjq8~Q
Teradata connection properties: https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#BABJIHBJ
Design
Currently Teradata 16.20 is supported. We suggest using Teradata JDBC driver 16.20.00.12 since it is the last stable release which is recommended by Teradata.
Sink Properties
Section | User Facing Name | Type | Description | Constraints |
---|---|---|---|---|
Basic | Label | String | Label for UI | Required |
Reference Name | String | Uniquely identified name for lineage | Required | |
Driver name | String | Teradata driver name | Required (default teradata) | |
Host | String | Teradata host | Required (defaults to localhost on UI) | |
Port | Number | Specific port where Teradata running on | Optional (default 1025) | |
Database | String | Database name to connect | Required | |
Table Name | String | Name of a database table to write to | Required | |
Credentials | Username | String | DB username | Required |
Password | Password | User password | Required | |
Advanced | Connection arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#BABJIHBJ |
Source Properties
User Facing Name | Type | Description | Constraints | |
---|---|---|---|---|
Basic | Label | String | Label for UI | Required |
Reference Name | String | Uniquely identified name for lineage | Required | |
Driver name | String | Teradata driver name | Required (default teradata) | |
Host | String | Teradata host | Required (defaults to localhost on UI) | |
Port | Number | Specific port where Teradata running on | Optional (default 1025) | |
Database | String | Database name to connect | Required | |
Import Query | String | Query for import data | Valid SQL query | |
Credentials | Username | String | DB username | Required |
Password | Password | User password | Required | |
Advanced | Connection arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#BABJIHBJ |
Action Properties
User Facing Name | Type | Description | Constraints | |
---|---|---|---|---|
Basic | Label | String | Label for UI | Required |
Driver name | String | Teradata driver name | Required (default teradata) | |
Host | String | Teradata host | Required (defaults to localhost on UI) | |
Port | Number | Specific port where Teradata running on | Optional (default 1025) | |
Database | String | Database name to connect | Required | |
Database Command | String | Database command to run | Valid SQL query | |
Credentials | Username | String | DB username | Required |
Password | Password | User password | Required | |
Advanced | Connection arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#BABJIHBJ |
Data Types Mapping
Teradata Data Type | CDAP Schema Data Type | Support | Comment |
---|---|---|---|
BYTEINT | Schema.Type.INT | + | |
SMALLINT | Schema.Type.INT | + | |
INTEGER | Schema.Type.INT | + | |
BIGINT | Schema.Type.LONG | + | |
DECIMAL/NUMERIC | Schema.LogicalType.DECIMAL | + | |
FLOAT/REAL/DOUBLE PRECISION | Schema.Type.DOUBLE | + | |
NUMBER | Schema.LogicalType.DECIMAL | + | |
BYTE | Schema.Type.BYTES | + | |
VARBYTE | Schema.Type.BYTES | + | |
BLOB | Schema.Type.BYTES | + | |
CHAR | Schema.Type.STRING | + | |
VARCHAR | Schema.Type.STRING | + | |
CLOB | Schema.Type.STRING | + | |
DATE | Schema.Type.DATE | + | |
TIME | Schema.LogicalType.TIME_MICROS | + | |
TIMESTAMP | Schema.LogicalType.TIMESTAMP_MICROS | + | |
TIME WITH TIME ZONE | Schema.LogicalType.TIME_MICRO | + | |
TIMESTAMP WITH TIME ZONE | Schema.LogicalType.TIMESTAMP_MICROS | + | |
INTERVAL YEAR | Schema.Type.STRING | + | |
INTERVAL YEAR TO MONTH | Schema.Type.STRING | + | |
INTERVAL MONTH | Schema.Type.STRING | + | |
INTERVAL DAY | Schema.Type.STRING | + | |
INTERVAL DAY TO HOUR | Schema.Type.STRING | + | |
INTERVAL DAY TO MINUTE | Schema.Type.STRING | + | |
INTERVAL DAY TO SECOND | Schema.Type.STRING | + | |
INTERVAL HOUR | Schema.Type.STRING | + | |
INTERVAL HOUR TO MINUTE | Schema.Type.STRING | + | |
INTERVAL HOUR TO SECOND | Schema.Type.STRING | + | |
INTERVAL MINUTE | Schema.Type.STRING | + | |
INTERVAL MINUTE TO SECOND | Schema.Type.STRING | + | |
INTERVAL SECOND | Schema.Type.STRING | + | |
PERIOD(DATE) | - | Struct data type | |
PERIOD(TIME) | - | Struct data type | |
PERIOD(TIME WITH TIME ZONE) | - | Struct data type | |
PERIOD(TIMESTAMP) | - | Struct data type | |
PERIOD(TIMESTAMP WITH TIME ZONE) | - | Struct data type | |
JSON | - | Struct data type | |
ARRAY | - | ||
XML | - | ||
ST_Geometry | Schema.Type.STRING | + | |
MBR | - | ||
MBB | - |
Approach
Create a module teradata-plugin in database-plugins project, reuse existing database-plugins code if possible. Add Teradata-specific properties to configuration, add support for Teradata-specific data types. Update UI widgets JSON definitions.
UI Impact or Changes
Configurable database properties are presented as named text fields instead of arbitrary key value pairs. Teradata source and sink are separate entries with Teradata logo in source and sink lists.
Test Case(s)
TODO
Sample Pipeline
TODO