Versions Compared
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Info |
---|
This article is posted on the CDAP Doc wiki and will be maintained here: https://cdap.atlassian.net/wiki/spaces/DOCS/pages/1165393956/Parsing+CSV+Files |
Overview
This document is a collection of best practices for Wrangler CSV parsing and cleansing of CSV files.
General Tips
Parse CSV Avoid using automatic header detection with
parse-as-csv
should avoid using automatic header determination directive(parse-as-csv :col â\tâ false
). On large files that are distributed across multiple partitions and , the header is no available to be set in different partitionsline which is the first line of CSV is not present. This will either result in failure or records will be lost.If you have to use
parse-as-csv
directive, then make sure the files are smaller than 128 MB (lowest data block).
Table of Contents |
---|