4

I try to figure out a way of controlling a browser (preferably Firefox) via R scripts in order to retrieve information controlled by AJAX/Javascripts in Websites. For example, how could I retrieve the values in field "Modell" at http://www.mobile.de/home/index.html?

AFAIU, Gabe Becker's package "RFirefox" does provide some sort of link between R an Firefox. But being a Windows-Kid (not by conviction, but longstanding network effects ;-)), I couldn't try it myself yet so I'm not sure if it can do what I'm after.

So: is there anyone out there who does have some experience with either RFirefox or handling AJAX via R yet? Don't want you to do my homework, but before I plunge into the Linux world I'd just like to assess if it's worth it.

Nevertheless, any code examples would be greatly appreciated. ;-)

Rappster
  • 12,762
  • 7
  • 71
  • 120
  • Not exactly what you need, but similar: http://stackoverflow.com/questions/7867105/parsing-html-and-following-a-javascript-link/7905861#7905861 – Dieter Menne Oct 26 '11 at 17:12

1 Answers1

3

I'm not clear on why you need a browser to do this. It's just web scraping; it will require some kind of parser, certainly, but not necessarily a browser. I think that RFirefox may be barking up the wrong tree. If you want to play with Javascript+R connections, take a look at Duncan Temple Lang's SpiderMonkey.

Even so, I think it may be better to collect data with a more serious crawling/scraping facility suited for working with Javascript. This question on SO seems particularly aligned with that. My recommendation would be to get a tool that does what you need, and then interface that with R at the simplest level possible. There are bindings for Webkit to several languages, albeit this doesn't seem to be the case for R.

This question addresses your situation even more closely: it is also on Windows. It doesn't use Webkit. The three suggestions in the accepted answer refer to accessing the tools, written in C/C++, from Python. R has interfaces for both, so you may find it easier to write some stuff to work with these and pass objects and instructions back and forth between R and Python or C/C++.

Community
  • 1
  • 1
Iterator
  • 20,250
  • 12
  • 75
  • 111
  • Thanks for the advices! I'll check them out in more detail to figure out what would be the best way for me. – Rappster Oct 26 '11 at 16:48
  • Okay, scanned through your links: 1) Always thought that the easiest way in the long run would just to "simulate" a real user, thus my idea of remote-controlling a browser. 2) SpiderMonkey: must have missed it, thanks for the pointer! 3) What do you think of Ruby in that respect? Worth a try or am I better off with plunging into Python for this task – Rappster Oct 26 '11 at 16:57
  • I have very limited familiarity with Ruby, so my preference would be Python, though that's a personal decision. As the question is somewhat solved with Python and the Python community has a lot of R sympathizers (and vice versa), that might be a guide, though it's up to you. – Iterator Oct 31 '11 at 18:42