scrapy, invalid xpath, starting position

Question

I'm trying to use scrapy, and I have this ridiculous html that I'm trying it on. Using the Xpath Checker firefox plugin, this is the first row in the table:

id('page')/x:table/x:tbody/x:tr[1]/x:td[2]/x:table/x:tbody/x:tr/x:td/x:table/x:tbody/x:tr[1]

I get an error if I copy that xpath in:

def parse(self, response):
      hxs = HtmlXParseSelector(response)
      data = hx.select("id('page')/x:table/x:tbody/x:tr[1]/x:td[2]/x:table/x:tbody/x:tr/x:td/x:table/x:tbody/x:tr[1]")

raise ValueError("Invalid XPath: %s" % xpath)

Why does it not recognize this xpath?

Also, is there a way for scrapy to grab all data from the 3rd row and onwards? The first two rows are just title and the legend.

If you could provide a link to the page you're trying to scrape, we might be able to help out. :) — Talvalin, Mar 14 '13 at 23:23
Did you register the [namespace](http://stackoverflow.com/questions/4817112/xpath-query-for-xml-node-with-colon-in-node-name)? — Steven Almeroth, Mar 17 '13 at 22:51
Only one advice - use Firebug or Chrome Developer not Firefox Xpath Checker — Sunita Venkatachalam, Oct 22 '13 at 06:12

score 1 · Answer 1 · answered Mar 14 '13 at 23:53

1

Firefox adds an html tag "tbody", but really html can be without it. Try to get the html page with your program and see where the tag "tbody". I faced the same problem and the same in Firefox.

answered Mar 14 '13 at 23:53

Alexander Zh

111
3

I had the same issue with chrome adding "tbody" to the path but scrapy does not recognize it. The solution is to just remove the "tobdy". http://doc.scrapy.org/en/latest/topics/firefox.html – Miguel Febres Nov 26 '14 at 10:54

scrapy, invalid xpath, starting position

1 Answers1