Fread
from data.table
package can often handle irregular tables (in my case, a SAM file) with a fill=TRUE
switch, simply padding the "incomplete" lines with NA's. Sometimes it fails to find the correct maximum number of columns, if they appear late in the table, like in this example:
> body = paste0(rep("1 2\n", 1000), collapse="")
> main = paste0(body, "1 2 3\n", body, collapse="")
> fread(main, fill=T)
Warning message:
In fread(main, fill=T) :
Stopped early on line 1001. Expected 2 fields but found 3.
Consider fill=TRUE and comment.char=. First discarded non-empty line: <<1 2 3>>
Is there any way to force fread
to use the correct number of columns with fill
option, in this case three?
Currently, I just extract the number of columns, pad the first line (with sed
), fread
and remove the padding. This removes any benefit from fast loading.
Related to this old question, pre "fill" option.