0

Trying to get all titles from http://www.112.ru/services/wanted/people/index.shtml?roztype=1 using Yahoo Pipes Xfetch module.

My query //span[@class='uchbold'] select all titles in Firepath successfully. But in Yahoo Pipes and Hpple there is no results.

Shmidt
  • 16,436
  • 18
  • 88
  • 136
  • There may be people who are prepared to click on a link in www.112.ru. Others (including me) are less trustful, and will therefore not answer your question. It's much better to copy a sufficient extract of the XML into the question. – Michael Kay Jun 24 '13 at 13:15
  • @MichaelKay As Jens found it out it was a problem due to async loading parts of the data. In case I put only html source here, it would be impossible to answer the question. – Shmidt Jun 24 '13 at 13:38

1 Answers1

1

These class attributes are inserted by a JavaScript which isn't executed using Yahoo Pipes and Hpple.

Also the contents are loaded by ajax, you will have to trace the ajax calls and develop against this interface.

Using Firebug I could trace it loading

http://www.112.ru/publish/00/01/0508.01/2012/08//contents.xml

and lots of other "contents.xml" files which returned 404 errors. It contains contents in form of elements like

<view file="0901156380089d71_0508.01_00_01.full.shtml" format="full" indexed="true"/>

which seem to link again to some HTML snippets containing the actual data.

Jens Erat
  • 37,523
  • 16
  • 80
  • 96
  • Can you please recommend any tutorial/documentation for further reading? – Shmidt Jun 24 '13 at 12:10
  • 1
    You will have to read and analyze the JavaScript loading the files to somehow get an idea on how they're loading the data and how to do it without JavaScript. Once you know how to determine the URLs to load, you need to fetch and combine them all. I'm sorry I can't help you any further as I do not know Yahoo Pipes in detail. Or you build some kind of proxy using Node.JS or some other tools and use it to evaluate the JavaScript before parsing with Yahoo Pipes, but I guess that's even more difficult (and requires a server). – Jens Erat Jun 24 '13 at 12:17