DB Export Netezza

Introduction 

A source Plugin in Hydrator is needed to efficiently export data from Netezza.   Netezza includes strategies for exporting data including the use of transient external tables,  and a command line tool NZSQL that can export data to a flat file. 
Use-case
A Hydrator User would like to incorporate Netezza data into a pipeline using a source plugin that does not require a JDBC connection to perform the export from Netezza.

User Stories
  • As a Hydrator User I want a plugin that I can use to export data from Netezza so that I can use the data in my pipeline.  
  • As a Hydrator User I want a Netezza plugin that exports data efficiently using existing Netezza tools. 
  • As a Hydrator User I want the export data capability of the Netezza plugin to be based on a sql query that I issue.  
  • User should be able to specify credentials.
  • Passwords should not be viewable in plain text from inside pipeline viewer or hydrator studio.  
  • User should be able to specify Netezza Instance
  • User should be able to specify location of nzqsql Utility. 
  • User should be able to specify type of output.
  • User should be able to specify location of output files.  
  • User should know of connectivity errors, or malformed queries/output identifier.

Example

Example for how the plugin should work

Implementation Tips

NZSQL is a command line utility that can run remotely to allow exporting.   The output format can be specified to a flat file with specified delimiter or an external table.

 

nzsql information: Netezza Docs
External Table: Netezza Docs
Command example to output to a flat file: nzsql -d {dbname} -u {username} -h {host} -c "select * from table_with_90_columns " -t -F -o output.csv

Command example to use External Table: CREATE EXTERNAL TABLE '/tmp/export.csv' USING (DELIM ',') AS
SELECT foo.x, bar.y, bar.dt FROM foo, bar WHERE foo.x = bar.x;

 

Design 

Design:

Input Json Format:

{
  "name": "Netezza",
  "type": "batchsource",
  "properties": {
        "userName": "cdap",
        "password": "****",
        "host": "localhost",
        "nzsqlPath": "/opt/netezza",
        "db": "user",
        "outputType": "File",
		"delimiter": ",",
        "outputPath": "/home/cdap/netezza",
		"sqlQuery": "select * from table_with_90_columns"
   }
}

Properties:

  • userName: User name for user db.

  • password: Password for user db.
  • host: Hostname or IP of the Netezza instance.
  • nzsqlPath: Path of nzsql utility.
  • db: Name of the DB from on which sql query needs to be executed.
  • outputType: Type of the output. It may be Flat file or External table.
  • delimiter: Delimiter needs to be specified which can be used while writing records in a flat file.
  • outputPath: Path of the output files.
  • sqlQuery: Sql query which needs to be executed on the specified db name to export the data.

 

Approach:

  • Ganymed SSH-2 library can be used to connect to the Netezza instance to run NZSQL utility as specified by the user.
  • The output format can only be among : Flat files(with specified delimiters) or external table.
  • Assuming the sql query provided by the user will be in below format:

          For example:
          1. nzsql -d {dbname} -u {username} -h {host} -c "select * from table_with_90_columns " -t -F -o output.csv
          2. CREATE EXTERNAL TABLE '/tmp/export.csv' USING (DELIM ',') AS SELECT foo.x, bar.y, bar.dt FROM foo, bar WHERE foo.x = bar.x;

  • User will only specify the query part like "select * from table_with_90_columns" or "SELECT foo.x, bar.y, bar.dt FROM foo, bar WHERE foo.x = bar.x" in "sqlQuery" input and rest of the command has to be created at the plugin side using the inputs provided by the user.
  • In the host input, user will specify the hostname for the Netezza machine instance.

Table of Contents

Checklist

  • User stories documented 
  • User stories reviewed 
  • Design documented 
  • Design reviewed 
  • Feature merged 
  • Examples and guides 
  • Integration tests 
  • Documentation for feature 
  • Short video demonstrating the feature