3

I am importing data from large files (fwf and csv) that were once stored as tapes, so they may have errors created when writing-reading the tape.

The old files are in fixed width file (fwf) format The newer files are in .csv format (with ";" as separator)

The erros could be something like:

for fwf files:

  • corrupted characters that disloge all of the sequence of data sideways, making all the cells, from that point on, mismatch the content.
  • missing end of line characters

for csv files:

  • corrupted characters
  • corrupted characters or lettters in otherwise numeric columns
  • unintended separators (leading to more separators then expected (
    number of cols - 1)

Is there a way to import this into R skipping the error lines, but keeping a log of the errors so that they can be checked manually afterwards?

Or should I use another tool, external to R? In this case, which tool?

I have about ~100 files very large files (90GB each) so I would prefer some data.table::fread based, or some other fast, solution for this.

LucasMation
  • 2,408
  • 2
  • 22
  • 45

0 Answers0