11

I am part of ASP.NET and C# project. We are trying to make our asp.net portal Google search engine friendly (https://developers.google.com/webmasters/ajax-crawling/). Web pages in our site are generated dynamically and the DOM is modified with JavaScript so we use NHTML to generate the snapshot (Server-side) when the Google search engine sends the request. It generates the HTML snapshot but the issue is when there is a script error in the page, it returns partially rendered page (the content that gets modified by the page JavaScript is partially rendered). Pages work perfectly in Browsers.

I tried the following options

ThrowExceptionOnScriptError = false,
ThrowExceptionOnFailingStatusCode = false

But no LUCK.

Is there a way to Force NHtmlUnit to ignore page errors and continue execution?

following is the code

    // Create a webclient.
    WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17)
        {
            ThrowExceptionOnScriptError = false,
            ThrowExceptionOnFailingStatusCode = false
        };

    webClient.WaitForBackgroundJavaScript(5000);

    // Load the Page with the given URL.
    HtmlPage htmlPage = webClient.GetHtmlPage(url);

    // Return the page for the given URL as Text.
    return htmlPage.WebResponse.ContentAsString;
Soner Gönül
  • 97,193
  • 102
  • 206
  • 364
RAM
  • 856
  • 1
  • 8
  • 27

1 Answers1

5
// Create a webclient.
WebClient webClient = new WebClient(BrowserVersion.FIREFOX_17)
    {
        JavaScriptEnabled = true
        ThrowExceptionOnScriptError = false,
        ThrowExceptionOnFailingStatusCode = false,
    };

webClient.WaitForBackgroundJavaScript(5000);

HtmlPage htmlPage = webClient.GetHtmlPage(url);

// Return the page for the given URL as Text.
return htmlPage.WebResponse.ContentAsString;

I noticed you didn't enable JavaScript, sorry if I'm wrong.

vfioox
  • 730
  • 1
  • 8
  • 22
  • I don't know about @RAM's case, but I got mine enabled. The thing is, I'm not sure it's a [N]HtmlUnit problem rather than a jQuery problems since it runs inside a Headless browser. It may depends upon something HtmlUnit isn't providing (like the window object, or something else the Browser itself would provide). – Allov Apr 24 '13 at 14:15
  • Yes I enabled JavaScript. JavaScript code in the page throw an exception that causes NHTMLUNIT to stop execution so the page is partially rendered. The same JavaScript works in Browsers and renders the complete page. – RAM Apr 26 '13 at 15:37