SAP HANA Database Plugin


Checklist

  • User Stories Documented
  • User Stories Reviewed
  • Design Reviewed
  • APIs reviewed
  • Release priorities assigned
  • Test cases reviewed
  • Blog post

Introduction 

This plugin allow to use SAP HANA database as both sink and source

Goals

  • Users can choose and install SAP HANA source and sink plugins.
  • Users should see SAP HANA logo on plugin configuration page for better experience.

  • Users should get relevant information from the tool tip:

    • The tool tip should describe accurately what each field is used for.

  • Users should not have to specify any redundant configuration

  • Users should get field level lineage for the source and sink that is being used.

  • Reference documentation should be updated to account for the changes.

  • The source code for SAP HANA database plugin should be placed in repo under data-integrations org.

  • The data pipeline using source and sink plugins should run on both mapreduce and spark engines.

  • Integration tests for SAP HANA database plugin should be added in the test repo.

User Stories 

  • Users should be able to install SAP HANA specific database source and sink plugins from the Hub
  • Users should have each tool tip accurately describe what each field does

  • Users should get field level lineage information for the SAP HANA source and sink 

  • Users should be able to setup a pipeline avoiding specifying redundant information

  • Users should get updated reference document for SAP HANA source and sink

  • Users should be able to read all the DB types

Plugin Type

  • Batch Source
  • Batch Sink 
  • Real-time Source
  • Real-time Sink
  • Action
  • Post-Run Action
  • Aggregate
  • Join
  • Spark Model
  • Spark Compute

Design reference

Design

The suggestion is to create maven submodule saphana under database-plugins repo, as it was done for other plugins

Only SAP HANA express edition can be tested, as we don't have full version at hands.

Compatability matrix only available for paid customers: https://launchpad.support.sap.com/#/notes/1906576 [TODO: how to get this?]

Documentation describes following versions of SAP HANA database [TODO: identify difference between them]:

  • 2.0 SPS 04
  • 2.0 SPS 03
  • 2.0 SPS 02
  • 2.0 SPS 01
  • 2.0 SPS 00
  • 1.0 SPS 12

We also need to understand, which versions we do want to support [TODO: identify this]

Design for the plugin can be derrived from generic JDBC classes, but modified according to the custom properties SAP HANA have.


Common Properties (1.0 SPS12)

The properties, that are specific to source, sink or action will be listed separately.

SectionUser Configuration LabelVariableTypeOptionsLabel DescriptionDefault User widget
BasicDatabasedatabaseNamestring
The name of the database to connect to in multi-tenant database container systems.
Text Box
BasicUseruserstring
The user name. Optional, depending on the authentication method used.
Text Box
BasicPasswordpassword

password


The user password. Optional, depending on the authentication method used.
Text Box
BasicSchemacurrentschemastring
Sets the current schema, which is used for identifiers without schema.Defaults to current user nameText Box
BasicRead OnlyreadOnlyboolean
When enabled, only read-only statements are permitted. Attempting to execute DLL or DML causes an exception.falseToggle
AdvancedAutocommitautocommitboolean

When in autocommit mode, every statement is automatically committed. Otherwise, commits and/or rollbacks must be done manually.

true

Toggle
AdvancedClose handles on finalizecloseHandlesOnFinalizeboolean
When enabled, connections, statements, and result sets are automatically closed when their Java finalizers are run.trueToggle
AdvancedConnection timeout

communicationTimeout

int
Connection timeout in milliseconds. Setting this option to 0 disables the timeout.0Text Box
AdvancedDistributiondistributionenum

OFF,CONNECTION, STATEMENT, ALL

Choose the distribution mode. Specifying STATEMENT does not include CONNECTION distribution.

STATEMENTRadio Button
AdvancedEmpty Timestamp is NULLemptyTimestampIsNullboolean
When enabled, DAYDATE, SECONDTIME, SECONDDATE, and LONGDATE values inserted as empty strings are returned as NULLs. When disabled, these values are returned as out-of-band values.falseToggle
AdvancedEncryptionencryptboolean
When enabled, all communication is encrypted via SSL.falseToggle
AdvancedIgnore topologyignoreTopology

 boolean


true = Use the topology unless port-forwarding is detected

true = Always ignore the topology

falseToggle
AdvancedTransaction isolationisolationenumREAD_UNCOMMITTED, READ_COMMITTED, REPEATABLE_READ, SERIALIZABLESets the isolation level for the connection.READ_COMMITTEDRadio Button
AdvancedHDB User Keykeystring
The key for the HdbUserStore.
Text Box
AdvancedLocalelocale

string


ISO locale codeThe client localeText Box
AdvancedPacket sizepacketsizeint
Sets the maximum size of a request packet sent from the client to the server in bytes. The minimum is 130, 000 bytes.130000Text Box
AdvancedReconnectreconnectboolean
When enabled, the system automatically reconnects to the database instance after a command timeout or a when the connection was broken and reconnecting restores the old state (for example, if no transaction was open).trueToggle
AdvancedSplit Batch CommandssplitBatchCommandsboolean
Allow split and parallel execution of batch commands on partitioned tables.falseToggle
AdvancedVirtual Host NamevirtualHostName

string


The virtual host name. This value is ignored if no HdbUserStore key is specified.
Text Box

Common Properties (2.0)

[TODO ]

Data Types Mapping

SAP HANA Data TypeCDAP Schema Data TypeComment

BOOLEAN

Schema.Type.BOOLEAN


TINYINT

Schema.Type.INT


SMALLINT

Schema.Type.INT


INTEGER

Schema.Type.INT


BIGINT

Schema.Type.LONG


SMALLDECIMAL

Schema.Type.DECIMAL


DECIMAL

Schema.Type.DECIMAL


REAL

Schema.Type.FLOAT


DOUBLE

Schema.Type.DOUBLE


VARCHAR

Schema.Type.STRING


NVARCHAR

Schema.Type.STRING


ALPHANUM

Schema.Type.STRING


SHORTTEXT

Schema.Type.STRING


DATE

Schema.Type.DATE


TIME

Schema.Type.DATE


SECONDDATE

Schema.Type.DATE


TIMESTAMP

Schema.Type.TIMESTAMP_MICRO


VARBINARY

Schema.Type.BYTES


BLOB

Schema.Type.BYTES


CLOB

Schema.Type.BYTES

Not sure about BYTES or STRING

NCLOB

Schema.Type.BYTES


TEXT

Schema.Type.STRING


ARRAY

Schema.Type.ARRAY


Pipeline Samples

[TODO: attach sample pipeline]

UI Impact or Changes

  • Configurable database properties are presented as named text fields instead of arbitrary key value pairs. SAP HANA source and sink are separate entries with SAP HANA logo in source and sink lists.

Test Scenarios

Test IDTest DescriptionExpected Results












Releases

Release X.Y.Z

Release X.Y.Z

Related Work

Future work