1

I'm working with Scrapy and lxml trees to sort out html trees.

I noticed that there is difference between these two xpath expressions. I was under the impression that they were interchangeable. Could someone please explain me the difference?

response.xpath('/html/body/div/table/tr/td/table/tr/td/table/tr/td/table/tr/td/table/tr/td/a/img/..//text()').extract()

response.xpath('/html/body/div/table/tr/td/table/tr/td/table/tr/td/table/tr/td/table/tr/td/a//text()').extract()
dirkk
  • 6,160
  • 5
  • 33
  • 51
NST
  • 724
  • 9
  • 20

1 Answers1

3

The difference between a/img/..//text() and a//text() is that the first will return you text nodes ONLY from a elements with img elements as children, whereas the second will return text nodes from a elements irrespective of whether they have img elements as children.

Put another way, a/img/..//text() could equally be written a[img]//text(); compare this with a//text().

Mathias Müller
  • 22,203
  • 13
  • 58
  • 75
user52889
  • 1,501
  • 8
  • 16