...
This how-to does not cover on how to export data from mainframes. There are various ways to bring datasets from DB2 or IMS. This article assumes , that you have used DBMS export or other utilities (like FTP) to bring datasets into flat files.
...
Datasets (or dataset partitions) can be exported in various way from DB2 or IMS. It’s important to understand:
Whether the file is text or binary,
What RECFM was specified when data was FTP or copied from mainframe,
What is the code page,
Whether the file exported from mainframe was Big-endian (IBM mainframes) or Little-endian
Info |
---|
Note that this article only covers how to handle parse one copy copybook record parsing for mainframe files. |
Steps
...
In general section, there are few important configurations that need to setup correctly. If the configuration do not match the attributes of the file being processed, processing will fail. It can generally be hard to debug due to the nature of input file.
Record Format (RECFM) - Record format specifies the type of the records in the file. Records from mainframe can be either Fixed Length or Variable Length or Variable Block. Select the right configuration depending on the knowledge of the file or group of files you are processing.
(RECFM=F) Fixed Length record file have all the records of the same size (bytes), there are EOL, CTRL-M characters indicating the end of line, they are just stream of bytes.
(RECFM=V) Variable Length record files have records that can be varying sizes. Typical different sizes might indicate that there are different copybooks. Each copybook could be associated with one record size.
(RECFM=VB) Variable Block record files have variable record length files, but the variable length records are grouped in blocks. Such files are easy to process in parallel.
Info |
---|
If you do not know the record format of the file, start with RECFM=V. If there is a mismatch, the processing would fail. It is very difficult to detect whether the records within the file are variable length or fixed length as everything in file is just stream of bytes. |
Code Page - Code page defines the character encoding that associates a unique number with the set of printable and control characters. Mainframes defined different code pages for different regions. So depending on the origin or character set on mainframe, the code page should be chosen. For example any mainframe in US will code page as cp037.
Dialect - Specifies the endianness of the mainframe. IBM mainframes are defaulted as Mainframe (Big-Endian) and there Intel and Fujitsu mainframes that have different endianness.
Copybook - Specifies the COBOL copybook that contains the structure of the data files. Copybook, contains only the fields and datatypes used in the COBOL file. The plugin can directly import COBOL copybooks (.cpy files) as definitions for generating the target schema. The schema definition is based on analyzing the entire copybook including REDEFINES and OCCURS. The schema can be simple or complex. Various different types of copybooks are currently supported.
How to split copybook -
...
Depending on how you want to interpret the records within the file you can either use option to not split an individual record as different records (Do not split) or you can chose to split the the record at REDEFINE
.
If you have selected ‘Do not split' and file has various record types, then you would want to configure Record Selector. If entire file is of a single record type, then Record Selector is not required.
In case, you have selected ‘Split on REDEFINE’, you should configure Record Association. This will allow different records that are split at
REDEFINE
to be associated with record types.
Output Schema - Specifies the whether the output schema that gets represented in the target is a flat structure or nested structure as defined in copybook. See Field names and Output Schema types section for how the fields from copybook are translated into the target schema.
Info |
---|
If you have specified Output Schema type as ‘Flat’ then Copybook split has no effect on the record being read. This option is important when Output Schema type is ‘Hierarchical’. |
Record Associations
Record Selector
When to split Copybook ?
There are various scenarios
Field name and Output Schema types
...