2

I would like to understand why this difference exists between read.table and fread. Maybe you'll know the workaround to make fread work. I have two lines of code that perform the same goal-to read a file. fread performs faster and more efficiently than read.table, but read.table produces less no errors on the same data set.

SUCCESSFUL READ.TABLE approach

table <- read.table("A.txt",header=FALSE,sep = "|", fill=TRUE, quote="", stringsAsFactors=FALSE)

FREAD approach

table <- fread("A.txt",header=FALSE,sep = "|")

FREAD returns the classic error, which I explored,

Expected sep ('|') but new line or EOF ends field 44 on line 57193 when reading data

Initially, read.table returned what I think is a similar error when fill=TRUE was not included and would not read the file.

Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 7 did not have 45 elements

I am thinking that the errors might be similar in some way. According to the documentation, fill allows the following. If TRUE then in case the rows have unequal length, blank fields are implicitly added.

Is there a work around similar to fill=TRUE that can solve might address the fread problem?

Aaron
  • 317
  • 4
  • 16
  • It appears that the problem is similar to http://stackoverflow.com/questions/25853575/fread-read-certain-row-as-implicitly-ordered-factor where the programmer asked about the same parameter of reading implicitly. – Aaron Oct 14 '14 at 11:46
  • 1
    Have you read [this](http://stackoverflow.com/questions/18597488/fill-option-for-fread) – user20650 Oct 14 '14 at 11:51
  • @user20650 I think you found the answer, and I might have rephrased the question. Guys, please don't mark question as a duplicate because it might help others find this other answer more easily. – Aaron Oct 14 '14 at 12:00
  • 1
    @user20650 => Answer... get your cred. – Aaron Oct 14 '14 at 21:10

2 Answers2

0

ANSWER FROM MATT DOWLE: Fill option for fread

UPDATE : Very unlikely to be done. fread is optimized for regular delimited files (where each row has the same number of columns). However, irregular files could be read into list columns (each cell itself a vector) when sep2 is implemented; not filled in separate columns as read.csv can do.

Community
  • 1
  • 1
Aaron
  • 317
  • 4
  • 16
0

This answer highlights how data.table can now fill using fread.

https://stackoverflow.com/a/34197074/1569064

    fread(input, sep="auto", sep2="auto", nrows=-1L, header="auto", na.strings="NA",
         stringsAsFactors=FALSE, verbose=getOption("datatable.verbose"), autostart=1L,
         skip=0L, select=NULL, drop=NULL, colClasses=NULL,
         integer64=getOption("datatable.integer64"),         # default: "integer64"
         dec=if (sep!=".") "." else ",", col.names,
         check.names=FALSE, encoding="unknown", quote="\"",
         strip.white=TRUE, fill=FALSE, blank.lines.skip=FALSE, key=NULL,
         showProgress=getOption("datatable.showProgress"),   # default: TRUE
         data.table=getOption("datatable.fread.datatable")   # default: TRUE
         )
Community
  • 1
  • 1
AGS
  • 14,288
  • 5
  • 52
  • 67