2

I was scraping http://stats.espncricinfo.com/ci/engine/records/index.html?id=2;type=team

What I do need is the link attached to the XPath expression

/html/body/div[1]/div[3]/div[4]/table/tbody/tr/td[1]/div[2]/table[1]/tbody/tr/td/ul[2]/li/a[2]

In the page it is the element marked by "One-day Internationals" under the list of match results by year. The above expression was obtained using the Firefox extension Firebug.

However it is returning an empty list. Have tried using alternate xpath expressions like

//div[@id="ciHomeContentlhs"]/table/tbody/tr/td[1]/div/table[2]/tbody/tr/td/ul/li/a[2]/@href

with the same result.

the Xpath expression

//div[@id="ciHomeContentlhs"]/table

is giving me the table. However

//div[@id="ciHomeContentlhs"]/table/tbody

is returning an empty list. I've tested the xpath expressions out on http://videlibri.sourceforge.net/cgi-bin/xidelcgi and it shows the required href or node as the output. I can't seem to be able to get to work in Python.

Andersson
  • 51,635
  • 17
  • 77
  • 129
Vishnu
  • 113
  • 5

2 Answers2

3

<tbody> element is not a part of initial HTML source- it is generated by browser parser, so you shouldn't use it in your XPath expression.

You can use link text to match exact element:

//a[text()="One-Day Internationals"]
Andersson
  • 51,635
  • 17
  • 77
  • 129
1

Just remove all <tbody> in your Xpath expression as Andersson is saying. The following expression is giving me a list (as u want) of only this element:

response.xpath('/html/body/div[1]/div[3]/div[4]/table/tr/td[1]/div[2]/table[1]/tr/td/ul[2]/li/a[2]/text()').extract()
Alberto
  • 1,423
  • 18
  • 32