Introduction
A separate database plugin to support Oracle\-specific features and configurations.
Use-Case
- Users can choose and install Oracle source and sink plugins.
- Users should see Oracle logo on plugin configuration page for better experience.
- Users should get relevant information from the tool tip:
- The tool tip for the connection string should be customized specifically to the Oracle database,
- The tool tip should describe accurately what each field is used for.
- Users should not have to specify any redundant configuration (ex: JDBC type in source plugin, columns in the sink plugin).
- Users should get field level lineage for the source and sink that is being used.
- Reference documentation should be updated to account for the changes.
- The source code for Oracle database plugin should be placed in repo under data-integrations org.
- Integration tests for Oracle database plugin should be placed in repo under data-integrations org.
- The data pipeline using source and sink plugins should run on both mapreduce and spark engines.
User Stories
- User should be able to install Oracle specific database source and sink plugins from the Hub
- Users should have each tool tip accurately describe what each field does
- Users should get field level lineage information for the Oracle source and sink
- Users should be able to setup a pipeline avoiding specifying redundant information
- Users should get updated reference document for Oracle source and sink
- Users should be able to read all the DB types
Plugin Type
- Batch Source
- Batch Sink
- Real-time Source
- Real-time Sink
- Action
- Post-Run Action
- Aggregate
- Join
- Spark Model
- Spark Compute
Design Tips
Oracle connector reference: https://www.oracle.com/technetwork/database/application-development/jdbc/downloads/index.html
Existing database plugins: https://github.com/cdapio/hydrator-plugins/tree/develop/database-plugins
Oracle datatypes mappings and conversions:
Oracle has two types of driver thin and oci.
The JDBC Thin client is a pure Java, Type IV driver. It is lightweight and easy to install. More https://docs.oracle.com/cd/E11882_01/java.112/e16548/jdbcthin.htm#JJDBC28195
Oci driver requires native libraries to be installed, but provides some additional features like OCI Connection Pooling, Client Result Cache etc.
More https://www.oracle.com/database/technologies/appdev/oci.html,
https://docs.oracle.com/cd/E11882_01/java.112/e16548/instclnt.htm#JJDBC28218
Also oracle support tnsnames.ora file on client machine. More https://docs.oracle.com/database/121/NETRF/tnsnames.htm#NETRF007
Design
The suggestion is to create maven submodule oracle-plugin under database-plugins repo.
Sink Properties
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Label | String | Label for UI | |
Reference Name | String | Uniquely identified name for lineage | |
Host | String | Oracle host | Required (defaults to localhost on UI) |
Port | Number | Specific port where Oracle running on | Optional (default 1521) |
SID | String | SID name to connect | Required |
Service name | String | Service name to connect | Required |
Username | String | DB username | Required |
Password | Password | User password | Required |
Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties https://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html | |
Table Name | String | Name of a database table to write to | |
Driver type | Select | Oracle driver type | Possible values (thin, oci) |
Source Properties
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Label | String | Label for UI | |
Reference Name | String | Uniquely identified name for lineage | |
Host | String | Oracle host | Required (defaults to localhost on UI) |
Port | Number | Specific port where Oracle running on | Optional (default 1521) |
SID | String | SID name to connect | Required |
Service name | String | Service name to connect | Required |
Import Query | String | Query for import data | Valid SQL query |
Username | String | DB username | Required |
Password | String | User password | Required |
Bounding Query | String | Returns max and min of split-By Filed | Valid SQL query |
Split-By Field Name | String | Field name which will be used to generate splits | |
Number of Splits to Generate | Number | Number of splits to generate | |
Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties https://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html | |
Driver type | Select | Oracle driver type | Possible values (thin, oci) |
Action Properties
User Facing Name | Type | Description | Constraints |
---|---|---|---|
Label | String | Label for UI | |
Host | String | Oracle host | Required (defaults to localhost on UI) |
Port | Number | Specific port where Oracle running on | Optional (default 1521) |
SID | String | SID name to connect | Required |
Service name | String | Service name to connect | Required |
Username | String | DB username | Required |
Password | String | User password | Required |
Connection Arguments | Keyvalue | A list of arbitrary string tag/value pairs as connection arguments, list of properties https://docs.oracle.com/cd/E11882_01/appdev.112/e13995/oracle/jdbc/OracleDriver.html | |
Database Command | String | Database command to run | Valid SQL query |
Driver type | Select | Oracle driver type | Possible values (thin, oci) |
Data Types Mapping
Oracle Data Type | CDAP Schema Data Type | Support | Comment |
---|---|---|---|
VARCHAR2 | Schema.Type.STRING | * | No such type in java.sql.Types, mapping to String by default |
NVARCHAR2 | Schema.Type.STRING | * | No such type in java.sql.Types, mapping to String by default |
VARCHAR | Schema.Type.STRING | + | |
NUMBER | Schema.LogicalType.DECIMAL | + | |
FLOAT | Schema.Type.DOUBLE | + | FLOAT(126) by default value is represented internally as NUMBER |
LONG | Schema.Type.LONG | + | |
DATE | Schema.LogicalType.DATE | + | |
BINARY_FLOAT | Schema.Type.FLOAT | + | |
BINARY_DOUBLE | Schema.Type.DOUBLE | + | |
TIMESTAMP | Schema.LogicalType.TIMESTAMP_MICROS | + | |
TIMESTAMP WITH TIME ZONE | Schema.LogicalType.TIMESTAMP_MICROS | + | |
TIMESTAMP WITH LOCAL TIME ZONE | Schema.LogicalType.TIMESTAMP_MICROS | + | |
INTERVAL YEAR TO MONTH | Schema.Type.STRING | + | |
INTERVAL DAY TO SECOND | Schema.Type.STRING | + | |
RAW | Schema.Type.BYTES | + | |
LONG RAW | - | Type is deprecated by Oracle | |
ROWID | Schema.Type.STRING | + | |
UROWID | Schema.Type.STRING | * | No such type in java.sql.Types, mapping to String by default |
CHAR | Schema.Type.STRING | + | |
NCHAR | Schema.Type.STRING | + | |
CLOB | Schema.Type.STRING | + | |
NCLOB | Schema.Type.STRING | + | |
BLOB | Schema.Type.BYTES | + | |
BFILE | - | Type is deprecated by Oracle |
Approach
Create a module oracle-plugin in database-plugins project, reuse existing database-plugins code if possible. Add Oracle-specific properties to configuration, add support for Oracle-specific datatypes. Update UI widgets JSON definitions.
The default driver should be used for connection to oracle, otherwise user should connect via generic-database plugin.
Pipeline Samples
API changes
Deprecated Programmatic APIs
database-plugins is moved to Data Integrations
UI Impact or Changes
Configurable database properties are presented as named text fields instead of arbitrary key value pairs. Oracle source and sink are separate entries with Oracle logo in source and sink lists.
Test Scenarios
TODO
Releases
Release X.Y.Z
Related Work
Future work
MSSQL database plugin