0

When I try to get all headers of ads from autoscout24 using the xPath rule in google spreadsheet:

//div[@id="listOutput"]//div[@class="headcar"]/a/text()

The result is #NA - no data were received as a result of xPath queries.

But, when I try to get other element from page, for example "Kryteria wyszukiwania:" from the same page using the xPath rule:

//li/span

The output is correct.

What could be the problem?

Rubén
  • 34,714
  • 9
  • 70
  • 166
mpietrewicz
  • 90
  • 2
  • 7

1 Answers1

0

In the HTML source viewed in Chrome -- that is "view-source:http://www.autoscout24.pl/ListGN.aspx?..." not through Firebug or Chrome's Inspect Tool, div#listOutput contains only this:

<div id="listOutput">
    <div id="listoutput_part_one">
    </div>
    <div id="divSuperAdPlaceHolder">
        </div>
    <div id="listoutput_part_two">
    </div>
</div>

Whereas the source code does contains "li/span", for example:

        <li class="breadcrumb-item breadcrumb-first">
            <span>Kryteria wyszukiwania:</Span>
        </li>

The rest of elements must be built by some Javascript code run by the browser, and I doubt Google Spreadsheet interprets and executes the Javascript from the page.

paul trmbrth
  • 20,518
  • 4
  • 53
  • 66
  • Is there any way to get data that was generated by javascript? – mpietrewicz Jul 22 '13 at 11:51
  • Depends on your environment. One way is to use PhantomJS and see http://stefaanlippens.net/spider-javascript-manipulated-html-with-phantomjs or http://stackoverflow.com/questions/5490438/phantomjs-and-getting-modified-dom for example. See also https://groups.google.com/d/topic/phantomjs/r6cfwW6YkB4/discussion – paul trmbrth Jul 22 '13 at 12:15