4

While using htmlunit to scrape a webpage, I occasionally notice warnings like these that flood the console output.

Jul 24, 2011 5:12:59 PM com.gargoylesoftware.htmlunit.javascript.StrictErrorReporter warning
WARNING: warning: message=[Calling eval() with anything other than a primitive string value 
will simply return the value. Is this what you intended?] sourceName=[http://ad.doubleclick.net/adj/N5762.morningstar.com/B5553006.25;sz=728x90;click0=http://ads.morningstar.com/RealMedia/ads/click_lx.ads/www.morningstar.com/quicktake/fund/L34/648978540/TopLeft/Morningstar/JPM_FRpt_728x90_Jul_3827448/Fund_Reports_728x90_content.html/656d5477595534723465554144664a2b?;ord=648978540?] line=[356] lineSource=[null] lineOffset=[0]

Is there a way that I can have htmlunit ignore javascript from

or even just

Likewise, is there a way to have htmlunit only interpret the javascript on a webpage containing a particular substring or matching a regex?

DannyTree
  • 1,137
  • 2
  • 12
  • 16
  • I don't believe you can do this, though it does sound potentially useful. A quick dig into the source didn't provide any hooks. An alternative might be to just tell Log4j to not log these warnings. – Rodney Gitzel Aug 04 '11 at 19:03
  • 1
    @Rodney, thanks for the tip. Though I didn't mention this in the original post, filtering javascript should also improve performance. htmlunit, which can be dog slow, wouldn't have to js files to download and less javascript to execute. – DannyTree Aug 30 '11 at 03:46

1 Answers1

2

You might be able to remove the unwanted javascript by implementing your own ScriptPreProcessor. Your ScriptPreProcessor could detect the jsvascript you do not want to execute and than remove it from the web site.

I have not tried it yet, but might work.

MrSmith42
  • 9,961
  • 6
  • 38
  • 49