13

I am having trouble to read a file containing lines like the one below in R.

"_:b5507F4C7x59005","Fabiana D\"atri"

Any idea? How can I make read.table understand that \" is the escape of quote?

Cheers, Alexandre

qwr
  • 9,525
  • 5
  • 58
  • 102
Alexandre Rademaker
  • 2,683
  • 2
  • 19
  • 21
  • 1
    I might be good to add a line or two extra in your sample so you don't get answers that only work for your single line... – Tommy Aug 15 '11 at 15:50
  • See [How to read quoted text containing escaped quotes](http://stackoverflow.com/questions/6032296/how-to-read-quoted-text-containing-escaped-quotes) for a solution in R using readLines. – jnas Jun 12 '14 at 11:35

7 Answers7

6

It seems to me that read.table/read.csv cannot handle escaped quotes.

...But I think I have an (ugly) work-around inspired by @nullglob;

  • First read the file WITHOUT a quote character. (This won't handle embedded , as @Ben Bolker noted)
  • Then go though the string columns and remove the quotes:

The test file looks like this (I added a non-string column for good measure):

13,"foo","Fab D\"atri","bar"
21,"foo2","Fab D\"atri2","bar2"

And here is the code:

# Generate test file
writeLines(c("13,\"foo\",\"Fab D\\\"atri\",\"bar\"",
             "21,\"foo2\",\"Fab D\\\"atri2\",\"bar2\"" ), "foo.txt")

# Read ignoring quotes
tbl <- read.table("foo.txt", as.is=TRUE, quote='', sep=',', header=FALSE, row.names=NULL)

# Go through and cleanup    
for (i in seq_len(NCOL(tbl))) {
    if (is.character(tbl[[i]])) {
        x <- tbl[[i]]
        x <- substr(x, 2, nchar(x)-1) # Remove surrounding quotes
        tbl[[i]] <- gsub('\\\\"', '"', x) # Unescape quotes
    }
}

The output is then correct:

> tbl
  V1   V2          V3   V4
1 13  foo  Fab D"atri  bar
2 21 foo2 Fab D"atri2 bar2
Tommy
  • 39,997
  • 12
  • 90
  • 85
4

On Linux/Unix (or on Windows with cygwin or GnuWin32), you can use sed to convert the escaped double quotes \" to doubled double quotes "" which can be handled well by read.csv:

p <- pipe(paste0('sed \'s/\\\\"/""/g\' "', FILENAME, '"'))
d <- read.csv(p, ...)
rm(p)

Effectively, the following sed command is used to preprocess the CSV input:

sed 's/\\"/""/g' file.csv

I don't call this beautiful, but at least you don't have to leave the R environment...

krlmlr
  • 25,056
  • 14
  • 120
  • 217
2

My apologies ahead of time that this isn't more detailed -- I'm right in the middle of a code crunch.

You might consider using the scan() function. I created a simple sample file "sample.csv," which consists of:

V1,V2
"_:b5507F4C7x59005","Fabiana D\"atri"

Two quick possibilities are (with output commented so you can copy-paste to the command line):

test <- scan("sample.csv", sep=",", what='character',allowEscapes=TRUE)
## Read 4 items
test
##[1] "V1"                "V2"                "_:b5507F4C7x59005"
##[4] "Fabiana D\\atri\n"

or

test <- scan("sample.csv", sep=",", what='character',comment.char="\\")
## Read 4 items
test
## [1] "V1"                "V2"                "_:b5507F4C7x59005"
## [4] "Fabiana D\\atri\n"

You'll probably need to play around with it a little more to get what you want. And I see that you've already mentioned writeLines, so you may have already tried this. Either way, good luck!

CompEcon
  • 1,994
  • 1
  • 14
  • 12
1

read_delim from package readr can handle escaped and doubled double quotes, using the arguments escape_double and escape_backslash.

For example, if our file escapes quotes by doubling them:

"quote""","hello"
1,2

then we use

read_delim(file, delim=',')  # default escape_backslash=FALSE, escape_double=TRUE

If our file escapes quotes with a backslash:

"quote\"","hello"
1,2

we use

read_delim(file, delim=',', escape_double=FALSE, escape_backslash=TRUE)
qwr
  • 9,525
  • 5
  • 58
  • 102
1

I was able to get your eample to work by setting the quote argument:

> read.csv('test.csv',quote="'",head=FALSE)
                   V1                  V2
1 "_:b5507F4C7x59005" "Fabiana D\\"atri" 
2 "_:b5507F4C7x59005" "Fabiana D\\"atri" 
nullglob
  • 6,903
  • 1
  • 29
  • 31
  • I thought adding `allowEscapes=TRUE` might help but it doesn't seem to. – Ben Bolker Aug 15 '11 at 16:39
  • ...This kind of works unless there are single quotes in the file. But all strings end up with double-quotes around them (and the escaped quotes are still escaped), so some more processing is required... – Tommy Aug 15 '11 at 16:58
  • As @Ben posted, the solution won't in my case. I have comma inside other fields. – Alexandre Rademaker Aug 20 '11 at 22:06
0

As of newer R versions, readr::read_delim() is the correct answer.

data = read_delim(filename, delim = "\t", quote = "\"",
    escape_backslash=T, escape_double=F,
    # The columns depend on your data
    col_names = c("timeStart", "posEnd", "added", "removed"),
    col_types = "nncc"
)
BurninLeo
  • 4,240
  • 4
  • 39
  • 56
-2

This should be fine with read.csv(). Take a look at the help for ?read.csv - the option for specifying the quote is quote = "....". In this case, though, there may be a problem: it seems that read.csv() prefers to see matching quotes.

I tried the same with read.table("sample.txt", header = FALSE, as.is = TRUE), with your text in sample.txt, and it seems to work. When all else fails with read.csv(), I tend to back up to read.table() and specify the parameters carefully.

Iterator
  • 20,250
  • 12
  • 75
  • 111
  • No, it doesn't work, try something slightly more complex, like: `writeLines(c("\"foo\",\"Fab D\\\"atri\",\"bar2\"","\"foo2\",\"Fab D\\\"atri2\",\"bar2\"" ), "foo.txt")`... – Tommy Aug 15 '11 at 15:46
  • Keep in mind that your original output is, from R's perspective, just 1 string per line, rather than a bunch of separately quoted strings. If you want the entire thing quoted in a file, then you need surrounding quotes. – Iterator Aug 15 '11 at 16:57
  • After `read.table`, I get `a$V2` outputs `[1] ",\"Fab" ",\"Fab"` which isn't correct. ...And if you open "foo.txt" in a text editor, you'll see that it contains strings like what the OP stated. – Tommy Aug 15 '11 at 17:36