1

I am trying to use R to download a table of data found on the web site https://sites.google.com/a/slu.edu/swartwout/home/cubesat-database. I have not found an approach which would do this. I can access the data only if I copy and paste it into Excel. This is the attempt which does not work:

    url <- read_html("https://sites.google.com/a/slu.edu/swartwout/home/cubesat-database")
    table <- url %>%
      + html_nodes(xpath='/html/body/script[2]/text()') %>%
      + html_table(header = TRUE, fill=TRUE)

I have tried a number of html_nodes and none of them will work. I may be doing this incorrectly, or I may need a different approach. The data seems to be generated by javascript. The data shown in the table is not at all visible in the html code, even though it is visible when viewing the web site. Hadley Wickham's Selectorgadget works very well when applied to the imdb page for The Lego Movie, but not at all on this web site.

As I write this, StackOverflow has suggested a similar question, stumped on how to scrape the data from this site (using R). This suggests using RSelenium. I have followed this approach without success. I get a number of errors, including "package or namespace load failed for ‘RSelenium’".

Community
  • 1
  • 1
Paul M
  • 677
  • 1
  • 8
  • 15
  • If the table is generated on-the-fly, i.e., with JavaScript and an XHTML request for a JSON file or whatnot, why not grab the JSON file and manipulate it instead of trying to scrape the page? – royhowie Sep 27 '15 at 21:21
  • The code which generates the table is very complex, and I am still trying to understand it. It now looks like some parts of the table are numbered and some are generated by JavaScript. It would take a great deal of time to figure out which is which. I have yet to discover the source of the data which gets processed. I have also never programmed with JavaScript. Other than that, it is a wonderful idea. I keep thinking that, if I can see the data displayed on the screen, there should be a way of capturing it. Like copy and paste. Maybe I keep doing it that way? – Paul M Sep 27 '15 at 22:33
  • I have continued my searches, and made a discovery. Do a search at StackOverflow. Try the search terms: "scrape web Python". I think this will tell you where the solution is. O'Reilly actually has a book on this subject, written by Ryan Mitchell. Now, I have not learned Python any more than I have learned JavaScript, but if these are solutions, they are good paths to follow. If better than R, then so be it. – Paul M Sep 29 '15 at 01:17

0 Answers0