3

I have a csv file like this:

id,name,value
 1,peter,5
 2,peter\,paul,3

How can I read this file and tell R that "\," does not indicate a new column, only ",".

I have to add that file has 400mb.

Thanks

spore234
  • 3,550
  • 6
  • 50
  • 76
  • use read.csv and sep="," , something like this df <- read.csv("path to your file /df.csv", sep =",") –  Apr 14 '16 at 11:11

1 Answers1

4

You can use readLines() to read the file into memory and then pre-process it. If you're willing to convert the non-separate commas into something else, you can do something like:

> read.csv(text = gsub("\\\\,", "-", readLines("dat.csv")))
  id       name value
1  1      peter     5
2  2 peter-paul     3

Another option is to utilize the fact that the fread function from data.table can perform system commands as its first argument. Then you can do something like a sed operation on the file before reading it in (which may or may not be faster):

> data.table::fread("sed -e 's/\\\\\\,/-/g' dat.csv")
   id       name value
1:  1      peter     5
2:  2 peter-paul     3

You can always then use gsub() to convert the temporary - separator back into a comma.

Thomas
  • 43,637
  • 12
  • 109
  • 140
  • thanks. The first solution is too slow for huge files, I had to abort after 4+ hours. The second one fails when it encouters a line with an observation like this: 123,time=\\,5. Error is ``Expected sep (',') but new line or EOF ends field 3 on line 987841 when reading data:`` – spore234 Apr 15 '16 at 08:37
  • @spore234 You should be able to modify the sed expression to allow multiple slashes, perhaps with: `"sed -e 's/\\\\\\+,/-/g' dat.csv"` – Thomas Apr 15 '16 at 08:56