4

My ultimate goal is to build a web crawler capable of downloading all of the images on a webpage. My understanding from the reading I've done is that I need to embed a rendering/layout engine such as Gecko or Webkit.

Unfortunately, I'm running windows, so PyWebkit is out and short learning C++ for Gecko or Java to use Rhino, I'm not sure where to turn.

Is there a reliable rendering engine with python bindings that will work in windows (64-bit, Windows 7)? Is there an easy way to execute javascript within a python script on windows?

dschafer
  • 53
  • 1
  • 6
  • if this is just for personal use you might consider installing linux in VirtualBox and use pyWebkit from there. – SpliFF Feb 14 '11 at 23:43
  • Ideally it would run on windows so it could run on someone else's computer so the data can live on their system. That said, by the time I showed them how to install python and whatever library I wind up using, I might as well just do it on my laptop and carry a jump drive over. – dschafer Feb 15 '11 at 02:38

1 Answers1

3

You don't need Webkit to do that. All you need it an engine to run Javascript code, so take a look at Gogole V8 or Mozilla SpiderMonkey.

If you're prefer Python to build your crawler, you may want to use PyV8 as it provides all necessary bindings.

yurymik
  • 2,194
  • 20
  • 14
  • Splitting tiny hairs: [PyV8 appears to only have a version for 32-bit windows](http://code.google.com/p/pyv8/downloads/list). Also, for others reading, installing SpiderMonkey on Windows appears to be nontrivial, but I found [this article how to do it](http://www.mailsend-online.com/blog/extending-spidermonkey-javascript-on-windows.html). – dschafer Feb 15 '11 at 02:51
  • 32-bit shouldn't stop you. Just install matching version of Python and you're good to go. – yurymik Feb 16 '11 at 01:45