1

I'm trying to read some data from the Facebook Graph API Explorer into R to do some text analysis. However, it looks like there are unescaped backslashes in the JSON feed, which is causing rjson to barf. The following is a minimal example of the kind of input that's causing problems.

library(rjson)
txt <- '{"data":[{"id":2, "value":"I want to \\"post\\" a picture\\video"}]}'
fromJSON(txt)

(Note that the double backslashes at \\" and \\video will convert to single backslashes after parsing, which is what's in my actual data.)

I also tried the RJSONIO package which also gave errors, and even crashed R at times.

Has anyone come across this problem before? Is there a way to fix this short of manually hunting down every error that crops up? There's potentially megabytes of JSON being parsed, and the error messages aren't very informative about where exactly the problematic input is.

Hong Ooi
  • 56,353
  • 13
  • 134
  • 187

2 Answers2

0

Just replace backslashes that aren't escaping double quotes, tabs or newlines with double backslashes.

In the regular expression, '\\\\' is converted to one backslash (two levels of escaping are needed, one for R, one for the regular expression engine). We need the perl regex engine in order to use lookahead.

library(stringr)
txt2 <- str_replace_all(txt, perl('\\\\(?![tn"])'), '\\\\\\\\')
fromJSON(txt2)
Richie Cotton
  • 118,240
  • 47
  • 247
  • 360
  • Thanks. That doesn't work though, since there are also characters like `\"` to denote escaped quote literals. IOW, sometimes the backslashes are correct, and sometimes they need to be modified. – Hong Ooi Nov 19 '13 at 09:20
  • I've modified my example to clarify. – Hong Ooi Nov 19 '13 at 09:22
  • @HongOoi OK, I've updated my answer. The best solution depends upon how consistently wrong the JSON is. If they are randomly single or double blakslashes, you'll probably need to do some manual correct. – Richie Cotton Nov 19 '13 at 10:03
  • Yeah, I just found the following snippet: `"message":"Ok thank you :)\"` Note the unescaped \ right before the ending quote. What a mess. Looks like there's no getting around manual correction. – Hong Ooi Nov 19 '13 at 11:04
  • Have you tried using `unexpected.escape = "keep"`, to at least prevent errors and get something read into R? – Richie Cotton Nov 19 '13 at 11:39
0

The problem is that you are trying to parse invalid JSON:

library(jsonlite)
txt <- '{"data":[{"id":2, "value":"I want to \\"post\\" a picture\\video"}]}'
validate(txt)

The problem is the picture\\video part because \v is not a valid JSON escape sequence, even though it is a valid escape sequence in R and some other languages. Perhaps you mean:

library(jsonlite)
txt <- '{"data":[{"id":2, "value":"I want to \\"post\\" a picture\\/video"}]}'
validate(txt)
fromJSON(txt)

Either way to problem is at the JSON data source that is generating invalid JSON. If this data really comes form Facebook, you found a bug in their API. But more likely you are not retrieving it correctly.

Jeroen Ooms
  • 31,998
  • 35
  • 134
  • 207