Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

Info

Note that this article only cover covers how to handle one copy record parsing for mainframe files.

Steps

Use the following steps to setup set up a pipeline that will process EBCDIC mainframe files.

...

  • Record Format (RECFM) - Record format specifies the type of the records in the file. Records from mainframe can be either Fixed Length or Variable Length or Variable Block. Select the right configuration depending on the knowledge of the file or group of files you are processing.

    • (RECFM=F) Fixed Length record file have all the records of the same size (bytes), there are EOL, CTRL-M characters indicating the end of line, they are just stream of bytes.

    • (RECFM=V) Variable Length record files have records that can be varying sizes. Typical different sizes might indicate that there are different copybooks. Each copybook could be associated with one record size.

    • (RECFM=VB) Variable Block record files have variable record length files, but the variable length records are grouped in blocks. Such files are easy to process in parallel.

  • Code Page - Code page defines the character encoding that associates a unique number with the set of printable and control characters. Mainframes defined different code pages for different regions. So depending on the origin or character set on mainframe, the code page should be chosen. For example any mainframe in US will code page as cp037.

  • Dialect - Specifies the endianness of the mainframe. IBM mainframes are defaulted as Mainframe (Big-Endian) and there Intel and Fujitsu mainframes that have different endianness.

  • Copybook - Specifies the COBOL copybook that contains the structure of the data files. Copybook, contains only the fields and datatypes used in the COBOL file. The plugin can directly import COBOL copybooks (.cpy files) as definitions for generating the target schema. The schema definition is based on analyzing the entire copybook including REDEFINES and OCCURS. The schema can be simple or complex. Various different types of copybooks are currently supported.

  • How to split copybook - You have an option

  • Output Schema - Specifies the whether the output schema that gets represented in the target is a flat structure or nested structure as defined in copybook. See Copybook, Field names and Output Schema types section for how the fields from copybook are translated into the target schema.

Record Associations

Record Selector

When to split Copybook

...

?

Field

...

name and Output Schema types

Field Names

As modern target storage systems do not support schema field names to include underscore(_) the 'Mainframe Record Reader' translates the copy record field names into Hungarian representation. For example if the name of the field in copybook is CL-MED-PRE-AUTHO-DT-CC, then it is translated to ClMedPreAuthoDtCc in the target system as well as pipeline schema.

Flat Structure

In flat structure, OCCURS are expanded and each field has name is suffixed with _<n> or (_<n>_<m>) suffixed at the end of the field name. E.g. if you have a field in copybook WS-FIN-AMT OCCURS 5 PIC S9(9)V99 COMP-3 then, in the output schema you will find WS-FIN-AMT_1, WS-FIN-AMT_2, WS-FIN-AMT_3, WS-FIN-AMT_4 and WS-FIN-AMT_5. With names translated to be compatible with target system, the field names as as follows: WsFinAmt_1,WsFinAmt_2,WsFinAmt_3,WsFinAmt_4, and WsFinAmt_5.

In case of nested structures (like two dimensional arrays), they would be represented two indexes separated by and underscore(_).

Nested Schema

Nested schema maps the nested structure of copybook into the target system. The target system should support following types in order to use Nested Schema: RECORDor STRUCT and ARRAYor UNION. If you are not sure, we recommend using the flat structure.

Partitioning for parallel processing

Doing another task

...

This feature is an experimental feature. This feature allows the ability to split a single variable length file into multiple small files, so that they can be processed in parallel. While it is advantageous to split a variable length file, it does introduce initial time in ensuring that the file is split at correct record boundaries.

Info

Initial startup time of pipeline execution with add additional few seconds to 10s of minutes depending on number of files, and number of records in a file.

Page Properties
hiddentrue

Related issues