Versions Compared
compared with
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Info |
---|
This article is posted on the CDAP wiki and will be maintained here: https://cdap.atlassian.net/wiki/spaces/DOCS/pages/1165393956/Parsing+CSV+Files. |
Overview
This document describes the best practice in practices for parsing csv CSV files.
General Tips
Parsing csv CSV with header extraction from the file as of release 6.2.x does not work when used on large files and multiple smaller files.
Recommendation: parse as csv with skipping header in Wrangler. This would entail the following steps
Add a filter condition to skip header
filter-row-if-true offset == 0
Drop offset
drop offset
Parse as csv skipping header
parse-as-csv :COLUMN '
SEPRATORSEPARATOR'
falsefalse
Rename the parsed entities in wrangler Wrangler (using set-headers)
set-headers :COL1,:COL2,…,:COLN
Table of Contents |
---|