A source Plugin in Hydrator is needed to efficiently export data from Netezza. Netezza includes strategies for exporting data including the use of transient external tables, and a command line tool NZSQL that can export data to a flat file.
Use-case
A Hydrator User would like to incorporate Netezza data into a pipeline using a source plugin that does not require a JDBC connection to perform the export from Netezza.
User Stories
As a Hydrator User I want a plugin that I can use to export data from Netezza so that I can use the data in my pipeline.
As a Hydrator User I want a Netezza plugin that exports data efficiently using existing Netezza tools.
As a Hydrator User I want the export data capability of the Netezza plugin to be based on a sql query that I issue.
User should be able to specify credentials.
Passwords should not be viewable in plain text from inside pipeline viewer or hydrator studio.
User should be able to specify Netezza Instance
User should be able to specify location of nzqsql Utility.
User should be able to specify type of output.
User should be able to specify location of output files.
User should know of connectivity errors, or malformed queries/output identifier.
Example
Example for how the plugin should work
Implementation Tips
NZSQL is a command line utility that can run remotely to allow exporting. The output format can be specified to a flat file with specified delimiter or an external table.
Command example to output to a flat file: nzsql -d {dbname} -u {username} -h {host} -c "select * from table_with_90_columns " -t -F -o output.csv
Command example to use External Table: CREATE EXTERNAL TABLE '/tmp/export.csv' USING (DELIM ',') AS
SELECT foo.x, bar.y, bar.dt FROM foo, bar WHERE foo.x = bar.x;
db: Name of the DB from on which sql query needs to be executed.
outputType: Type of the output. It may be Flat file or External table.
delimiter: Delimiter needs to be specified which can be used while writing records in a flat file.
outputPath: Path of the output files.
sqlQuery: Sql query which needs to be executed on the specified db name to export the data.
Approach:
Ganymed SSH-2 library can be used to connect to the Netezza instance to run NZSQL utility as specified by the user.
The output format can only be among : Flat files(with specified delimiters) or external table.
Assuming the sql query provided by the user will be in below format:
For example: 1. nzsql -d {dbname} -u {username} -h {host} -c "select * from table_with_90_columns " -t -F -o output.csv 2. CREATE EXTERNAL TABLE '/tmp/export.csv' USING (DELIM ',') AS SELECT foo.x, bar.y, bar.dt FROM foo, bar WHERE foo.x = bar.x;
User will only specify the query part like "select * from table_with_90_columns" or "SELECT foo.x, bar.y, bar.dt FROM foo, bar WHERE foo.x = bar.x" in "sqlQuery" input and rest of the command has to be created at the plugin side using the inputs provided by the user.
In the host input, user will specify the hostname for the Netezza machine instance.