5

Hey, I'd like to scrape some data from my blog using YQL:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']"

How can I use different bits of xpath in my query? E.g. can I do something like:

SELECT * FROM html WHERE url="http://site.com/blog" AND xpath="//div[@class='post']" AND xpath ="//div[@class='title']"

assuming I want to get the post and the title? I guess I could take in all the HTML but I'd rather only take what I need as speed is an issue here.

Once I have the HTML I want to extract the text from the markup, is it OK to use PHP regular expressions for this?

I also understand you can use CSS syntax, if you have experience using this with YQL and could guide me in how I could write a similar query to the one above but in CSS rather than XPATH I'd be grateful!

Thanks.

Umar Hansa
  • 99
  • 2
  • 7

3 Answers3

11

Regarding CSS:

See the YQL website itself for this. Search google for YQL and CSS (I can only post one link in here and the 2nd one is more useful.)

The example they have there is actually no longer working but you can try out this example, which scrapes the questions from the frontpage of stackoverflow.

YQL example

Multiple Selects with one XPATH:

You CAN do this directly with xpath syntax. e.g.

SELECT * FROM html WHERE url="www.asscompact.de" and xpath="//head/meta[@name='title']|//head/meta[@name='description']|//head/meta[@name='keywords']"
spier
  • 2,642
  • 1
  • 19
  • 16
  • Thanks, wasn't sure about the syntax but that's cleared it up. – Umar Hansa Oct 28 '10 at 03:23
  • Upvoted .. I figured this out myself but wanted to know if I can give a space or something between the result of two xPaths, so that later I could parse the result and get two different values. – Neil Jul 12 '13 at 10:58
  • Any idea how to fecth image and meta description from amazon.in/Seiko-Premier-Analog-Blue-Watch/dp/… ? –  Jun 29 '17 at 12:00
0

You can also write Multiple XPATH Selects like this:

SELECT * FROM html WHERE url="www.asscompact.de" and xpath="//head/meta[@name='title' or @name='description']"
NFpeter
  • 583
  • 4
  • 8
-3

It is not possible. You need to execute this query twice. The first time for the first xpath and the second time for the second xpath. Of course you can write your own open table declaration and provide support for this kind of queries.

Skarab
  • 6,981
  • 13
  • 48
  • 86