0

I am trying to scrape information from an HTML table. There are multiple tables on the page. Before each table there is a paragraph with text. I want to key off of this text field ("CONSOLIDATED" text in the pastebin below) to identify the table since there are no DIV tags on the page and therefore no other way to uniquely identify the table. How would I do this? What XPath statement would I use? Here's a link to the HTML page: http://pastebin.com/HeapZvPV

Thanks!

the Tin Man
  • 158,662
  • 42
  • 215
  • 303
sizzle
  • 2,222
  • 2
  • 21
  • 32
  • 3
    Don't post a link to the HTML. Links always break, making the question meaningless. Instead, strip the HTML down to only the information necessary to demonstrate your problem, and add that to your question. Also, we need to see what code you've written toward solving this problem. Without it, it looks like you're fishing for code, instead of trying to get help with a problem in what you've written. – the Tin Man Nov 19 '14 at 23:36
  • I love fishing for tuna, code, not so much. Anyways, here's what I've come up with so far which works but doesn't seem very robust: `doc = Nokogiri::HTML(open(url)) #puts doc.xpath("//p/b") #puts doc.xpath("//table").firstdoc.xpath("//p/b").each do |txt| if txt.text.strip.eql? "CONSOLIDATED" data_table = txt.parent.next_element.next_element.next_element.next_element data_date = data_table.xpath('tr[3]/td[3]').text data_total_current = data_table.xpath('tr[7]/td[4]').text puts data_date puts data_total_current end end` – sizzle Nov 20 '14 at 00:51
  • 3
    How about putting that into your question where it'd be readable? – the Tin Man Nov 20 '14 at 23:25

0 Answers0