I am importing data from large files (fwf and csv) that were once stored as tapes, so they may have errors created when writing-reading the tape.
The old files are in fixed width file (fwf) format The newer files are in .csv format (with ";" as separator)
The erros could be something like:
for fwf files:
- corrupted characters that disloge all of the sequence of data sideways, making all the cells, from that point on, mismatch the content.
- missing end of line characters
for csv files:
- corrupted characters
- corrupted characters or lettters in otherwise numeric columns
- unintended separators (leading to more separators then expected (
number of cols - 1)
Is there a way to import this into R skipping the error lines, but keeping a log of the errors so that they can be checked manually afterwards?
Or should I use another tool, external to R? In this case, which tool?
I have about ~100 files very large files (90GB each) so I would prefer some data.table::fread based, or some other fast, solution for this.