Page Comparison

Info
This article is posted on the CDAP doc wiki and will be maintained here: https://cdap.atlassian.net/wiki/spaces/DOCS/pages/1162707281/Parsing+Prettified+JSON

This article describes how to build a pipeline to parse a prettified Json JSON file.

Background

By default, the sources configured are configured with Text format that breaks the input file in a new line boundary (“\n” or “\r\n”). In the case of parsing a prettified Json JSON file, the entire file will not be read by Wrangler to parse it correctly and will result in a parsing error , if the default Text format is used with an , it will result in a parsing error such as:

Code Block
java.io.EOFException: End of input at line 1 column 2

Pipeline Setup

Here are the steps to process the sample file provided below:

Code Block

{
    "quiz": {
        "sport": {
            "q1": {
                "question": "Which one is correct team name in NBA?",
                "options": [
                    "New York Bulls",
                    "Los Angeles Kings",
                    "Golden State Warriros",
                    "Huston Rocket"
                ],
                "answer": "Huston Rocket"
            }
        },
        "maths": {
            "q1": {
                "question": "5 + 7 = ?",
                "options": [
                    "10",
                    "11",
                    "12",
                    "13"
                ],
                "answer": "12"
            },
            "q2": {
                "question": "12 - 8 = ?",
                "options": [
                    "1",
                    "2",
                    "3",
                    "4"
                ],
                "answer": "4"
            }
        }
   }

Step 1

...

. Source Configuration

...

Configure the source format to be blob. Configuring the format to blob reads the entire content of file that allows the prettified

...

JSON file to be parsed.

...

Image Added

...

Configure the output schema to be bytes.

...

Step 2

...

. Convert the records read to String

Use a projection Projection transform to convert the input bytes to string:

...

Step 3

...

. Parse using

...

Wrangler

...

Page Properties

hidden	true

Related issues

Versions Compared

Old Version 1

New Version Current

Key