0

I'm trying to use scrapy, and I have this ridiculous html that I'm trying it on. Using the Xpath Checker firefox plugin, this is the first row in the table:

id('page')/x:table/x:tbody/x:tr[1]/x:td[2]/x:table/x:tbody/x:tr/x:td/x:table/x:tbody/x:tr[1]

I get an error if I copy that xpath in:

def parse(self, response):
      hxs = HtmlXParseSelector(response)
      data = hx.select("id('page')/x:table/x:tbody/x:tr[1]/x:td[2]/x:table/x:tbody/x:tr/x:td/x:table/x:tbody/x:tr[1]")

raise ValueError("Invalid XPath: %s" % xpath)

Why does it not recognize this xpath?

Also, is there a way for scrapy to grab all data from the 3rd row and onwards? The first two rows are just title and the legend.

Bak
  • 365
  • 1
  • 4
  • 12

1 Answers1

1

Firefox adds an html tag "tbody", but really html can be without it. Try to get the html page with your program and see where the tag "tbody". I faced the same problem and the same in Firefox.

Alexander Zh
  • 111
  • 3
  • I had the same issue with chrome adding "tbody" to the path but scrapy does not recognize it. The solution is to just remove the "tobdy". http://doc.scrapy.org/en/latest/topics/firefox.html – Miguel Febres Nov 26 '14 at 10:54