-1

Basically, I am trying to read csv with Libray(data.table), fread but give me the error. I understand it stuck around line 342637 but cannot figure out how to read csv or skip this problematic line. I have tried all the options I have found online but still stuck in the same place. Since data is huge and I can't check what is wrong around line 342637. Is there any other way to read this csv file?

data.table ver: 1.10.4.3

user <- fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8")

   Read 13.1% of 1837283 rows
   Error in fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8") : 
   Expecting 77 cols, but line 342637 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' and/or (unescaped) '\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.


user <- fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8", fill=TRUE)

   Read 13.6% of 1837284 rows
   Error in fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8",  : 
   Expecting 77 cols, but line 342637 contains text after processing all cols. Tryagain with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' and/or (unescaped) '\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.

user <- fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8", sep=",")
   Read 13.6% of 1837283 rows
   Error in fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8",  : 
   Expecting 77 cols, but line 342637 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' and/or (unescaped) '\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.


user <- fread( "user.csv", stringsAsFactors = FALSE, encoding = "UTF-8", sep=",", fill=TRUE, blank.lines.skip=TRUE)

   Read 14.2% of 1837284 rows
   Error in fread("user.csv", stringsAsFactors = FALSE, encoding = "UTF-8",  : 
   Expecting 77 cols, but line 342637 contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep=',' and/or (unescaped) '\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.
joerna
  • 435
  • 1
  • 6
  • 13

1 Answers1

1

One option would be to do 2 fread() calls - one for the first 342636 rows and then one for the rest of the rows:

user_start <- fread('user.csv', nrows = 342636)
user_end <- fread('user.csv', skip = 342637)

user <- rbindlist(list(user_start, user_end))
Cole
  • 11,130
  • 1
  • 9
  • 24
  • Good answer to the question. I use this solution as well but find I lose two lines of data for each skip. fread is for regular (n x m) files and has problems for me with large files not well delimited.. I think there are a number of issues here. Did a MS SQL product give you the csv or text export by any chance? – rferrisx Oct 19 '19 at 21:06