3

Does anyone know of a small, fast, javascript emulator with DOM layer support? in either C/C++?

The problem: I need rudimentary support for javascript in a crawler application, and am wondering if there's any other options other than:

a) Integrating WebKit (headless) (slows down crawling tremendously). b) Integrating SpiderMonkey and writing the DOM layer myself (not looking forward to this option, not sure if its even worth it, speed wise).

Any other options?

Thanks!

please delete me
  • 711
  • 2
  • 9
  • 17
  • [Web crawler that can interpret javascript ](http://stackoverflow.com/questions/2670082/web-crawler-that-can-interpret-javascript) and [Building a web crawler - using Webkit packages ](http://stackoverflow.com/questions/162181/building-a-web-crawler-using-webkit-packages) are similar questions. But none of the answers on either are particularly detailed. – Matthew Flaschen Nov 21 '10 at 04:41

2 Answers2

2

Throw in my vote for WebKit (or some other existing code). Why bother reinventing the wheel, especially when the wheel is really fancy, complicated, has spent years in development.

If you really wanted, you could write some code that checks for javascript first, so you only pass off the jobs that need it. Then, write filters for common ad networks and analytics packages to ignore. If it were me though, I'd rather be consistent with how I am crawling.

Also, don't think that you only need rudimentary support, as there are some really funky websites out there that do a ton of DOM altering. If you expect your crawling to be reliable, be prepared to support what browsers support. The easiest way to do that is use the same code that the browsers are using.

Brad
  • 159,648
  • 54
  • 349
  • 530
  • Except that the engines themselves don't provide DOM; they rely on the browser to do so. – Ignacio Vazquez-Abrams Nov 21 '10 at 04:26
  • @Ignacio, WebKit is not just the JavaScript engine (that's JavaScriptCore). It includes WebCore and JavaScriptCore. WebCore has the DOM functionality. – Matthew Flaschen Nov 21 '10 at 04:31
  • @Matthew: Sure, but he says "The easiest way to do that is use one of the engines browsers use". This is false, since the engine itself typically does not provide DOM support. – Ignacio Vazquez-Abrams Nov 21 '10 at 04:36
  • @Ignacio, I updated my post using less-specific language, as to not confuse. – Brad Nov 21 '10 at 04:42
  • @Ignacio, the *JavaScript* engine doesn't provide DOM support. But the layout engine (which WebKit includes) does, and it used by browsers. Gecko is another candidate layout engine, also used by browsers. – Matthew Flaschen Nov 21 '10 at 04:44
0

Correction: V8 does not support DOM, just JavaScript, so not what you were looking for...

V8:

icyrock.com
  • 27,952
  • 4
  • 66
  • 85