Teradata database plugin

Introduction

A separate database plugin to support Teradata-specific features and configurations

Use-cases

  • Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin)
  • Users should get field level lineage for the source and sink that is being used
  • Reference documentation should be updated to account for the changes
  • The data pipeline using source and sink plugins should run on both MapReduce and Spark engines
  • Users can choose and install Teradata source and sink plugins
  • Users should see Teradata logo on plugin configuration page for better experience
  • Users should get relevant information from the tooltip
    • The tooltip for the connection string should be customized specifically to the Teradata database
    • The tooltip should describe accurately what each field is used for

User Stories

  • Users should have each tool tip accurately describe what each field does

  • Users should know the format for the Teradata connection string by hovering over tooltip for connection string

  • Users should get field level lineage information for the Teradata source and sink 

  • Users should be able to setup a pipeline avoiding specifying redundant information

  • Users should get updated reference document for Teradata source and sink

  • Users should be able to read all the DB types

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Design / Implementation Tips

Teradata JDBC Driver: https://downloads.teradata.com/download/connectivity/jdbc-driver

Teradata Express for VMware Player: https://downloads.teradata.com/download/database/teradata-express-for-vmware-player

Teradata data types: https://docs.teradata.com/reader/iRq_F~XxKYWu7Kv~HRd~ew/D_RBrANpKte9E5uvWjq8~Q

Teradata connection properties: https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#BABJIHBJ

Design


Currently Teradata 16.20 is supported. We suggest using Teradata JDBC driver 16.20.00.12 since it is the last stable release which is recommended by Teradata.

Sink Properties


Section

User Facing Name

Type

Description

Constraints

Basic

Label

String

Label for UI

Required

Reference Name

String

Uniquely identified name for lineage

Required

Driver nameStringTeradata driver nameRequired (default teradata)

Host

String

Teradata host

Required (defaults to localhost on UI)


Port

Number

Specific port where Teradata running on

Optional (default 1025)


Database

String

Database name to connect

Required


Table Name

String

Name of a database table to write to

Required





Credentials

Username

String

DB username

Required

Password

Password

User password

Required





AdvancedConnection argumentsKeyvalueA list of arbitrary string tag/value pairs as connection arguments, list of properties https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#BABJIHBJ

Source Properties



User Facing Name

Type

Description

Constraints

Basic

Label

String

Label for UI

Required

Reference Name

String

Uniquely identified name for lineage

Required

Driver nameStringTeradata driver nameRequired (default teradata)

Host

String

Teradata host

Required (defaults to localhost on UI)


Port

Number

Specific port where Teradata running on

Optional (default 1025)


Database

String

Database name to connect

Required


Import Query

String

Query for import data

Valid SQL query






Credentials

Username

String

DB username

Required


Password

Password

User password

Required






AdvancedConnection argumentsKeyvalueA list of arbitrary string tag/value pairs as connection arguments, list of properties https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#BABJIHBJ


Action Properties



User Facing Name

Type

Description

Constraints

Basic

Label

String

Label for UI

Required

Driver nameStringTeradata driver nameRequired (default teradata)

Host

String

Teradata host

Required (defaults to localhost on UI)


Port

Number

Specific port where Teradata running on

Optional (default 1025)


Database

String

Database name to connect

Required


Database Command

String

Database command to run

Valid SQL query






Credentials

Username

String

DB username

Required


Password

Password

User password

Required






AdvancedConnection argumentsKeyvalueA list of arbitrary string tag/value pairs as connection arguments, list of properties https://teradata-docs.s3.amazonaws.com/doc/connectivity/jdbc/reference/current/jdbcug_chapter_2.html#BABJIHBJ


Data Types Mapping

Teradata Data Type

CDAP Schema Data Type

Support

Comment

BYTEINT

Schema.Type.INT+

SMALLINT

Schema.Type.INT+

INTEGER

Schema.Type.INT+

BIGINT

Schema.Type.LONG+

DECIMAL/NUMERIC

Schema.LogicalType.DECIMAL+

FLOAT/REAL/DOUBLE PRECISION

Schema.Type.DOUBLE+

NUMBER

Schema.LogicalType.DECIMAL+

BYTE

Schema.Type.BYTES

+

VARBYTE

Schema.Type.BYTES+
BLOBSchema.Type.BYTES+

CHAR

Schema.Type.STRING+

VARCHAR

Schema.Type.STRING+
CLOBSchema.Type.STRING+

DATE

Schema.Type.DATE+

TIME

Schema.LogicalType.TIME_MICROS+

TIMESTAMP

Schema.LogicalType.TIMESTAMP_MICROS+

TIME WITH TIME ZONE

Schema.LogicalType.TIME_MICRO+

TIMESTAMP WITH TIME ZONE

Schema.LogicalType.TIMESTAMP_MICROS+

INTERVAL YEAR

Schema.Type.STRING+

INTERVAL YEAR TO MONTH

Schema.Type.STRING+

INTERVAL MONTH

Schema.Type.STRING+

INTERVAL DAY

Schema.Type.STRING+

INTERVAL DAY TO HOUR

Schema.Type.STRING+

INTERVAL DAY TO MINUTE

Schema.Type.STRING+

INTERVAL DAY TO SECOND

Schema.Type.STRING+

INTERVAL HOUR

Schema.Type.STRING+

INTERVAL HOUR TO MINUTE

Schema.Type.STRING+

INTERVAL HOUR TO SECOND

Schema.Type.STRING+

INTERVAL MINUTE

Schema.Type.STRING+

INTERVAL MINUTE TO SECOND

Schema.Type.STRING+

INTERVAL SECOND

Schema.Type.STRING+

PERIOD(DATE)


-Struct data type

PERIOD(TIME)


-Struct data type

PERIOD(TIME WITH TIME ZONE)


-Struct data type

PERIOD(TIMESTAMP)


-Struct data type

PERIOD(TIMESTAMP WITH TIME ZONE)


-Struct data type
JSON
-Struct data type
ARRAY
-
XML
-

ST_Geometry

Schema.Type.STRING+

MBR


-
MBB
-

Approach

Create a module teradata-plugin in database-plugins project, reuse existing database-plugins code if possible. Add Teradata-specific properties to configuration, add support for Teradata-specific data types. Update UI widgets JSON definitions.

UI Impact or Changes

Configurable database properties are presented as named text fields instead of arbitrary key value pairs. Teradata source and sink are separate entries with Teradata logo in source and sink lists.

Test Case(s)

TODO

Sample Pipeline

TODO