1

The Aim is to retrieve data from a website after it has finished its Ajax calls. Currently the data is being retrieved when the page first loads. But the required data is found inside a div which is loaded after an ajax call.

To summarize , the Scenario is as follows:

A webpage is called with some parameters passed inside C# code (currently using CsQuery for c#). when the request is sent, the page opens and a "Loading" picture shows and after few seconds the Required data is retrieved. The cSQuery code however retrieves the first Page contents with the "Loading" picture ..

the code is as follows

UrlBuilder ub = new UrlBuilder("<url>")
       .AddQuery("departure", "KHI")
       .AddQuery("arrival", "DXB")
       .AddQuery("queryDate", "2013-03-28")
       .AddQuery("queryType", "D");

        CQ dom = CQ.CreateFromUrl(ub.ToString());
        CQ availableFlights = dom.Select("div#availFlightsDiv");

        string RenderedDiv = availableFlights["#availFlightsDiv"].RenderSelection();
Benjamin Gruenbaum
  • 270,886
  • 87
  • 504
  • 504
Abdul Ali
  • 1,905
  • 8
  • 28
  • 50

2 Answers2

4

When you "scrape" a site you are making a call to the web server and you get what it serves up. If the DOM of the target site is modified by javascript (ajax or otherwise) you are never going to get that content unless you load it into some kind of browser engine on the machine that is doing the scraping, that is capable of executing the javascript calls.

Ben Robinson
  • 21,601
  • 5
  • 62
  • 79
  • 1
    I'd add that if you're trying to scrape a *very specific* ajax-driven web site then it's entirely possible (often even easy) just to look at the source code and target their internal API directly. How much work that is depends on how obfuscated and/or well written the code is. Other than that, yup, selenium or the like. – Jamie Treworgy Mar 14 '13 at 14:08
  • Thank you for the comment.. any method of doing of achieving this silently .. (i.e. no browser window) and doing in minimal resource consumption. The specific ajax call seems to also send a SessionId to generate the results (so consequently it seems that a direct call may not be possible). – Abdul Ali Mar 18 '13 at 12:51
  • Sending the session id with the ajax call is a common way of preventing you doing what you are trying to do. – Ben Robinson Mar 18 '13 at 13:20
1

Almost a year old question, you might have got your answer already. But would like mention this awesome project here - SimpleBrowser.

https://github.com/axefrog/SimpleBrowser

It keeps your DOM updated.

lame_coder
  • 3,085
  • 3
  • 19
  • 21