Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  1. Make sure you have an instance of Data Fusion.

  2. Download a sample Avro file, and upload it to a directory on GCS - .

View file
nameavro_adventureworks_person.avro

Parsing the Avro file using Wrangler

  1. Open Wrangler.

  2. Using the GCS connection, navigate to the directory where you have stored the sample Avro file. Select the file.
    The file should be shown in Wrangler with a single column body of type byte[].

  3. Now apply the directive Parse → Avro on the body column.
    The data should be split into multiple columns.

  4. Click the More link towards the top right, and select View Schema.

  5. In the Schema modal that appears as below, click the download button on the title bar , to download the schema of the Avro file.

    Image Added

  6. Store this file at a known location on your computer.

...

Applying transformations

  1. You can continue to perform more transformations as needed on this data. For reference, you can use these transformations - :

View file
nameavro_transformations.txt

...

  1. Once you’ve applied your transformations, click Create Pipeline. This will bring you into the Pipeline Studio, where you can see the GCS source and Wrangler nodes, and can create the rest of your pipeline.

  2. Now we need to perform some manual steps:

    1. Firstly, open the GCS source. Click the Actions button in the Output Schema section, and choose Import.

    2. Specify the schema file that you had downloaded in the previous step. This is the schema of your Avro file.

    3. Change the Format property of the GCS source from text to avro.

    4. Now, open the Wrangler node, and remove the first directive (parse-as-avro-file) in the Recipe property.

  3. Now you can build the rest of the pipeline. You can add additional transformations, aggregations, sinks, and error collectors as required. Attached is a sample pipeline for your reference.

...