Versions Compared
compared with
Key
- This line was added.
- This line was removed.
- Formatting was changed.
Overview
This document is a collection of best practices for Wrangler CSV parsing and cleansing of CSV files.
General Tips
Parse CSV
parse-as-csv
should avoid using automatic header determination (parse-as-csv :col ‘\t’ false
). On large files that are distributed across multiple partitions and header is no available to be set in different partitions.Sometimes a file looks fine, It could contain non-printable ASCII characters that usually don’t belong in CSV files. It can be hard to track these down. Use
find-and-replace
directive.`find-and-replace 's/\000-\007\013-\037\177-\377//g'
.
Table of Contents |
---|