Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Task marked incomplete

...

...

...

...

 

Goal 

This is a source plugin that would allow users to read and process mainframe files defined using COBOL Copybook. This should be basic first implementation.

Checklist

  •  User stories documented 
  •  User stories reviewed 
  •  Design documented 
  •  Design reviewed 
  •  Feature merged 
  •  Examples and guides 
  •  Integration tests 
  •  Documentation for feature 
  •  Short video demonstrating the feature

...

  • User should be able to copy paste or provide a file that gets loaded into text section for COBOL copybook

  • User should have the ability to select the fields that one wants into the output schema. So he they should be able to specify the field.

...

Input Format implementation : here 

Design

  • Assumptions:

...

  • For each AbstractLine read from the data file if the fields binary or binaryFile is trueIf "AbstractFieldValue"(JRecord) type is binary, the data will be encoded to Base64 format while reading
    for (ExternalField field : externalRecord.getRecordFields()) {
    AbstractFieldValue filedValue = line.getFieldValue(field.getName());
    if (filedValue.isBinary()) {
    value.put(field.getName(), .
    Integer.parseInt(new String(Base64.decodeBase64(Base64.encodeBase64(value.asString().getBytes()))), 2);

    or

    Base64.decodeInteger(Base64.encodeInteger(value.asBigInteger()));

It will depend on the field data type(int or BigInteger)

 

JRecord AbstractFieldValue type to JAVA primitive data type

JRecord AbstractFieldValue typeJAVA primitive data typeDescriptionComments

char, char just right , char null terminated,

char null padded

 java.lang.String  

num left justified, num right justified ,

num zero padded

int  
binary int, binary int positive, positive binary int fields  int

decode it using BASE64 format and then retrieve it.

Integer.parseInt(new String(Base64.decodeBase64(
Base64.

...

encodeBase64(value.asString().getBytes()))), 2)

...

The Base64.decodeBase64() accepts either binary or String data, and therefore, first encoding and then decoding the values 

Decoding it directly results in improper values

decimal, Mainframe Packed Decimal,

Mainframe Packed Decimal, Mainframe Zoned Numeric,

Mainframe Packed Decimal(+ve)

 

 java.math.BigDecimal Since CDAP Schema.Type does not have a BigDecimal data type, converting everything to DOUBLE

Binary Integer Big Endian (Mainframe, AIX etc) -  

  1. Binary Integer Big Endian (Mainframe?),
  2. Binary Integer Big Endian (only +ve),Positive Integer Big Endian
  java.math.BigDecimal

 decode it using BASE64 format and then retrieve it.

Base64.decodeInteger(Base64.encodeInteger(filedValue.asBigInteger()))


The Base64.decodeBase64() accepts either binary or String data, and therefore, first encoding and then decoding the values 

Decoding it directly resulted in improper values

 

Since CDAP Schema.Type does not have BigInteger converting this to LONG
num any decimal, positive num any decimal, assumed decimaldouble  
Hexlong
Long.parseLong(value.asHex(), 16)
Since CDAP Schema.Type does not have Hex converting this to LONG
Boolean / (Y/N)java.lang.Boolean  
Default              java.lang.String  

Examples

Properties : 

referenceName    

...

    :  This will be used to uniquely identify this source for lineage, annotating metadata, etc.
copybookContents  :  Contents of the COBOL copybook file which will contain the data structure
binaryFilePath          :

...

 Complete path of the .bin to be read.This will be a fixed length binary format file,that matches the copybook.

...

drop                           

...

:  Comma-separated list of fields to drop. For example: 'field1,field2,field3'.
maxSplitSize             :  Maximum split-size for each mapper in the MapReduce. \n Job. Defaults to 128MB.

Example :

This example reads data from a local binary file "file:///home/cdap/

...

DTAR020_FB.bin"  and parses it using the schema given in the text area "

...

COBOL Copybook

It will drop field "DTAR020-DATE" and generate structured records with

...

schema as

...

specified in the text area.

{

"name": "

...

CopybookReader",
"plugin": {

"name": "

...

CopybookReader",
"type": "batchsource",
"properties": {

"

...

drop" :

...

 "DTAR020-DATE",

"referenceName": "

...

Copybook",

"copybookContents":

"000100* \

...

n

000200* DTAR020 IS THE OUTPUT FROM DTAB020 FROM THE IML \

...

n

000300* CENTRAL REPORTING SYSTEM \

...

n

000400* \

...

n

000500* CREATED BY BRUCE ARTHUR 19/12/90 \

...

n

000600* \

...

n

000700* RECORD LENGTH IS 27. \

...

n

000800* \

...

n

000900 03 DTAR020-KCODE-STORE-KEY. \

...

n

001000 05 DTAR020-KEYCODE-NO PIC X(08). \

...

n

001100 05 DTAR020-STORE-NO PIC S9(03) COMP-3. \

...

n

001200 03 DTAR020-DATE PIC S9(07) COMP-3. \

...

n

001300 03 DTAR020-DEPT-NO PIC S9(03) COMP-3. \

...

n

001400 03 DTAR020-QTY-SOLD PIC S9(9) COMP-3. \

...

n

001500 03 DTAR020-SALE-PRICE PIC S9(9)V99 COMP-3. ",

"binaryFilePath": "file:///home/cdap/

...

DTAR020_FB.bin",

"

...

maxSplitSize": "5"

}

}

}

...

...

 

...