6

Fread from data.table package can often handle irregular tables (in my case, a SAM file) with a fill=TRUE switch, simply padding the "incomplete" lines with NA's. Sometimes it fails to find the correct maximum number of columns, if they appear late in the table, like in this example:

> body = paste0(rep("1 2\n", 1000), collapse="")
> main = paste0(body, "1 2 3\n", body, collapse="")
> fread(main, fill=T)
Warning message:
In fread(main, fill=T) :
  Stopped early on line 1001. Expected 2 fields but found 3.
  Consider fill=TRUE and comment.char=. First discarded non-empty line: <<1 2 3>>

Is there any way to force fread to use the correct number of columns with fill option, in this case three?

Currently, I just extract the number of columns, pad the first line (with sed), fread and remove the padding. This removes any benefit from fast loading.

Related to this old question, pre "fill" option.

loard
  • 93
  • 7
  • 2
    There are a few outstanding bugs like the one you've uncovered... `fread` shouldn't fail in this case. https://github.com/Rdatatable/data.table/issues/2727 – MichaelChirico Dec 11 '18 at 16:45
  • 1
    https://github.com/Rdatatable/data.table/pull/5119 – Wimpel Apr 29 '22 at 07:48
  • Came here because I have the same bug, do I understand that fill = integer has never been finished from github.com/Rdatatable/data.table/pull/5119 ? @MichaelChirico ? – statquant Dec 28 '22 at 09:36

0 Answers0