3

I'm having a heck of a time trying to convert a JSON file to a data frame. I have searched and tried to use others' code to my example but none seem to fit. The output is always still a list instead of a data frame.

library(jsonlite)
URL <- getURL("http://scores.nbcsports.msnbc.com/ticker/data/gamesMSNBC.js.asp?xml=true&sport=NBA&period=20160104")
URLP <- fromJSON(URL, simplifyDataFrame = TRUE, flatten = FALSE)
URLP

Here is what format the answer always ends up in.

$games
 [1] "<ticker-entry gamecode=\"2016010405\" gametype=\"Regular Season\"><visiting-team display_name=\"Toronto\" alias=\"Tor\" nickname=\"Raptors\" id=\"28\" division=\"ECA\" conference=\"EC\" score=\"\"><score heading=\"\" value=\"0\" team-fouls=\"0\"></score><team-record wins=\"21\" losses=\"14\"></team-record><team-logo link=\"http://hosted.stats.com/nba/logos/nba_50x33/Toronto_Raptors.png\" gz-image=\"http://hosted.stats.com/GZ/images/NBAlogos/TorontoRaptors.png\"></team-logo></visiting-team><home-team display_name=\"Cleveland\" alias=\"Cle\" nickname=\"Cavaliers\" id=\"5\" division=\"ECC\" conference=\"EC\" score=\"\"><score heading=\"\" value=\"0\" team-fouls=\"0\"></score><team-record wins=\"22\" losses=\"9\" ties=\"\"></team-record><team-logo link=\"http://hosted.stats.com/nba/logos/nba_50x33/Cleveland_Cavaliers.png\" gz-image=\"http://hosted.stats.com/GZ/images/NBAlogos/ClevelandCavaliers.png\"></team-logo></home-team><gamestate status=\"Pre-Game\" display_status1=\"7:00 PM\" display_status2=\"\" href=\"http://scores.nbcsports.msnbc.com/nba/preview.asp?g=2016010405\" tv=\"FSOH/SNT\" gametime=\"7:00 PM\" gamedate=\"1/4\" is-dst=\"0\" is-world-dst=\"0\"></gamestate></ticker-entry>" 
Jaap
  • 81,064
  • 34
  • 182
  • 193
David Gold
  • 59
  • 3
  • 6
    Looks like elements of `games` are xml. Take a look at `XML::xmlParse(z$games[1])`. – jbaums Jan 05 '16 at 00:20
  • 3
    That's really really ugly. XML wrapped in JSON. Next they'll put it in a word document just to add a layer of ridiculous. – Brandon Bertelsen Jan 05 '16 at 01:33
  • I thought I was crazy how much trouble I was having, now I know why. Thanks! – David Gold Jan 05 '16 at 03:00
  • _"You may also not use any software robots, spider, crawlers, or other data gathering or extraction tools, whether automated or manual, to access, acquire, copy, monitor, scrape or aggregate Content or any portion of the online services."_ in the [ToS](http://www.nbcsports.com/node/191181) – hrbrmstr Jan 05 '16 at 03:57
  • @hrbrmstr - Wow, that's an over-the-top ToS - I'd like to see their lawyers defend a ban on manual compilation of 'any portion' of sports scores. – thelatemail Jan 05 '16 at 04:07
  • 1
    Draconian or not, Mr Gold & the answerers who have posted code that hit the URL are in violation of them. As I've said before in SO comments, I actually know ppl who have had letters served (and more). I point it out solely b/c many nascent scrapers have no idea that sites use ToS like these (though others blatantly ignore them fully knowing they're there…such _rebels_). – hrbrmstr Jan 05 '16 at 04:12
  • 2
    @hrbrmstr No no noes... I didn't inhale, just saved the URL from a perfecly legal browser session. *cough* Good point - added it to the answer that imho should remain for the sake of demonstration unless, of course, the SO team receives mail from nbcsports. – lukeA Jan 05 '16 at 09:49

1 Answers1

5

With regards to @jbaums comment, you could try

library(jsonlite) 
library(RCurl)
library(dplyr)
library(XML) 
URL <- getURL("http://scores.nbcsports.msnbc.com/ticker/data/gamesMSNBC.js.asp?xml=true&sport=NBA&period=20160104")
lst <- lapply(fromJSON(URL)$games, function(x) as.data.frame(t(unlist(xmlToList(xmlParse(x)))), stringsAsFactors=FALSE))
df <- bind_rows(lst) 
View(df)

... in theory. However, as @hrbrmstr pointed out: practically, this would violate the website owner's terms of service.

lukeA
  • 53,097
  • 5
  • 97
  • 100