0

I am trying to use readr::read_fwf to read-in a .txt file. I know all of the column widths but I receive this parsing error which I do not know how to resolve:

 fwf_widths <- c(12, 2, 6, ...) 
 fwf_names <- c("V1", "V2", "V3", ...)
 col_types <- c("ccc...")

 df <- read_fwf(file = file, fwf_widths(fwf_widths, fwf_names), 
                         col_types = col_types)
Warning: 1 parsing failure.
row         col expected        actual        file                                                                         

372722 description          embedded null     /path/to/my/file.txt

I've tried adding trim_ws = T which does not get rid of the error. I looked at the actual contents of df[372722, ] and it looks like description contains the correct contents. Can someone please help me interpret what embedded null means and how I can potentially deal with this issue?

JRR
  • 578
  • 5
  • 21
  • Difficult to debug the issue without a reproducible example. Can you include one ? – Ronak Shah Jan 11 '20 at 05:09
  • Thanks for your response! It's hard to create a reproducible example for a problem regarding the reading in of a fwf file. I think what I'm primarily struggling with is how to understand "embedded null" itself which doesn't feel like it necessitates a toy dataset. "embedded" implies that there is some hard-to-read aspect of the variable's encoding. The "null" makes me think that there might be some sort of white-space surrounding this variable that this parsing error cannot interpret. I am just looking for any documentation on the subject, something I can't seem to find. Thanks again – JRR Jan 11 '20 at 05:18
  • I had similar problems, ended up pre-processing the file (before `readr::` or `read.csv` or whatever) with either `tr` or `sed`, suggestions for both in https://superuser.com/a/287998. – r2evans Jan 11 '20 at 06:12

1 Answers1

0

One of the bytes in your fwf is a zero-value byte, which is illegal in an R character string. If you just remove it you will destroy the alignment of the subsequent entries in the fwf, so you need to replace it. The following function will write a space character by default at any zero byte locations.

Please back up your .fwf file before using this.

replace_null <- function(path_to_file, file_size = 10000000L, replace_with = ' ')
{
  file_data <- readBin(path_to_file, "raw", file_size)
  file_data[file_data == as.raw(0)] <- as.raw(as.numeric(charToRaw(replace_with)))
  writeBin(file_data, path_to_file)
}

Now you just need to do

replace_null(file_path)

and then your own code should work. If it doesn't, your fwf must be corrupted.

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
  • Thank you for this response, this was really helpful although I'm still experiencing this issue. If I'm having the same problem in a csv is there any problem with just removing the zero-value byte? – JRR Jan 13 '20 at 22:25
  • No, unless the file is corrupted and the null value is overwriting a comma you should be fine. – Allan Cameron Jan 13 '20 at 22:48