XPath expression returning empty list in scrapy

Question

I was scraping http://stats.espncricinfo.com/ci/engine/records/index.html?id=2;type=team

What I do need is the link attached to the XPath expression

/html/body/div[1]/div[3]/div[4]/table/tbody/tr/td[1]/div[2]/table[1]/tbody/tr/td/ul[2]/li/a[2]

In the page it is the element marked by "One-day Internationals" under the list of match results by year. The above expression was obtained using the Firefox extension Firebug.

However it is returning an empty list. Have tried using alternate xpath expressions like

//div[@id="ciHomeContentlhs"]/table/tbody/tr/td[1]/div/table[2]/tbody/tr/td/ul/li/a[2]/@href

with the same result.

the Xpath expression

//div[@id="ciHomeContentlhs"]/table

is giving me the table. However

//div[@id="ciHomeContentlhs"]/table/tbody

is returning an empty list. I've tested the xpath expressions out on http://videlibri.sourceforge.net/cgi-bin/xidelcgi and it shows the required href or node as the output. I can't seem to be able to get to work in Python.

score 3 · Accepted Answer · answered May 26 '17 at 07:32

3

<tbody> element is not a part of initial HTML source- it is generated by browser parser, so you shouldn't use it in your XPath expression.

You can use link text to match exact element:

//a[text()="One-Day Internationals"]

answered May 26 '17 at 07:32

Andersson

51,635
17
77
129

Ah! Thanks. I figured out a workaround, but had no idea why i was unable to. – Vishnu May 26 '17 at 12:06

score 1 · Answer 2 · answered May 26 '17 at 07:40

1

Just remove all <tbody> in your Xpath expression as Andersson is saying. The following expression is giving me a list (as u want) of only this element:

response.xpath('/html/body/div[1]/div[3]/div[4]/table/tr/td[1]/div[2]/table[1]/tr/td/ul[2]/li/a[2]/text()').extract()

answered May 26 '17 at 07:40

Alberto

1,423
18
32

This works too. Thanks. – Vishnu May 26 '17 at 12:06

XPath expression returning empty list in scrapy

2 Answers2