2

I'm trying to stream and save public posts through R. I already got the token and made the search. This is what i've done.

require(RCurl)

require(rjson)

data <- getURL("https://graph.facebook.com/search?q=multishow&type=post&access_token=my_token")

That's fine, the "data" character found something. Now, how can I convert this "data" character into a data frame? And is it possible to stream this search during a specific timeout?

Thanks.

UPDATE:

Ok guys, now I can parse the JSON results from Facebook, but I'm still stuck in coverting as data.frame properly and stream to get new posts. Follow code below:

library (RCurl)
library (rjson)
library (plyr)

data <- getURL("https://graph.facebook.com/search?q=my_query&type=post&access_token=my_token, cainfo="cacert.perm")

#JSON parser
parser <- newJSONParser()
parser$addData(data)
fb.data <- parser$getObject()

#JSON to data.frame 
#sometimes it works direct from rjson
df <- data.frame(fromJSON(fb.data))

#sometimes it works only with plyr package
df <- do.call("rbind.fill", lapply(fb.data, as.data.frame))

Either way, I get a data.frame with 1 or 2 observations and hundreds of variables. Last search I did, I got my first observaton with 42 variables, second with 13 variables, and so on. Any clue of how can I handle with it?

  • Try for example , `library(XML);xmlParse(data)`; – agstudy Mar 26 '13 at 03:20
  • Thanks @agstudy. I got "Error: XML content does not seem to be XML". But i tried to parse with rjson and it worked. / parser <- newJSONParser() / parser$addData(data) / facebook <- parser$getObject() / print(facebook) – Luiz Felipe Freitas Mar 26 '13 at 04:04

1 Answers1

0

To set the timeout you can serach for a timeout option in the curl general options like this

 names(getCurlOptionsConstants())[grep('.*timeout*.',names(getCurlOptionsConstants()))]

This give me 6 options:

"timeout"              "connecttimeout"      
 "dns.cache.timeout"    "ftp.response.timeout" 
 "timeout.ms"           "connecttimeout.ms"   

I don't know which one you serach , but I guess you can try something like this:

getURL(url,.opts = list(timeout = 10))
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • Thanks @agstudy, but it didn't work. I need something like filterStream function, from streamR package. When I do filterStream(file="tweets.json", track="my_query", timeout=3600, oauth=my_oauth), it keeps streaming every tweet with "my_query" for an hour - 3600 seconds. – Luiz Felipe Freitas Mar 26 '13 at 15:39
  • @LuizFelipeFreitas looking at the code of `filterStream`: I get this : `getURL("https://stream.twitter.com/1/statuses/filter.json",..., .opts = list(verbose = FALSE, timeout=timeout))`. Clearly it is the same code as the solution proposed. – agstudy Mar 26 '13 at 15:55
  • Righ now i'm running filterStream...it has a "Capturing tweets" message, and it ends after 3600 seconds. With getURL, i set timeout=1000000000000000000000000000000000000000000000000000000000000000000000 and it ends in a second. – Luiz Felipe Freitas Mar 26 '13 at 16:05