Parsing CSV files

This article is posted on the CDAP wiki and will be maintained here: Parsing CSV Filesarchived.

Overview

This document describes the best practices for parsing CSV files.

General Tips

  • Parsing CSV with header extraction from the file as of release 6.2.x does not work when used on large files and multiple smaller files.

  • Recommendation: parse as csv with skipping header in Wrangler. This would entail the following steps 

    • Add a filter condition to skip header

      • filter-row-if-true offset == 0

    • Drop offset

      • drop offset

    • Parse as csv skipping header

      • parse-as-csv :COLUMN 'SEPARATOR' false

    • Rename the parsed entities in Wrangler (using set-headers)

      • set-headers :COL1,:COL2,…,:COLN



 

Â