I'm trying to scrape a page with financial data using Nokogiri and Ruby 1.9.3.
I'm having trouble getting the right XPath or CSS filter to get the table that holds the data, then iterate through the data and assemble it so the output can be put into a CSV file like this:
Date, Company,Symbol,ReportedEPS,Consensus EPS
20130828,CDN WESTERN BANK,CWB.TO,0.60,0.59
I used Firebug to get the XPath and CSS data. What is the correct format for XPath or CSS to extract the table then iterate through the lines to assemble them for output to a file?
require 'rubygems'
require 'mechanize'
require 'nokogiri'
require 'uri'
@agent = Mechanize.new do|a|
a.user_agent_alias = "Windows IE 6"
end
url = "http://biz.yahoo.com/z/20130828.html"
page = @agent.get(url)
doc = Nokogiri::HTML(page.body)
puts doc.inspect
#~ from firebug
#~ xpath /html/body/p[3]/table/tbody
#~ css html body p table tbody