I'm using XPath selectors to select each item on a page (roughly 24) and then I'm using XPath selectors on each item to return values from each one.
Even though I'm running the XPath selectors on the subnode it seems to be searching across all subnodes where I only want it done over each subnode individually.
Here's the code that searches for each item on the doc
and then iterates each html_listing
. It then passes it to a get_field_data_from
:
def get_listing(doc,field_data = {})
doc.xpath(get_listing_tag[:path]).each do |html_listing|
fd = get_field_data_from(html_listing,field_data)
if !field_data && fd.detect {|_,data| !data }
set_uri doc.xpath(get_sub_page_tag[:path])
get
fd = get_listing(Nokogiri::HTML(body),fd)
end
yield fd
end
end
So it iterates over all the Fields
I'm looking for which is used to retrieve the XPath selector containing strings using
selector = send("get_%s_tag" % field)
If the selector exists and the data has not already been found it will use the XPath selector on the HTML item
, store the text using
res[field] = item.xpath(selector[:path]).inner_text
and then return the resulting hash to be used in the next iteration.
def get_field_data_from(item,data)
Fields.inject(data) do |res,field|
selector = send("get_%s_tag" % field)
unless !selector || res[field]
begin
res[field] = item.xpath(selector[:path]).inner_text
rescue Exception => e
puts "Error for field: %s" % field
raise e
end
end
res
end
end
Somehow it seems that doing
res[field] = item.xpath(selector[:path]).inner_text
it seems to search over all the items rather then just that given item listing. I know it's doing that because:
doing:
puts item.xpath(selector[:path]).inner_text
Returns more than one result
I'm not actually looping over all the html_listings. Where it yields the field data
yield fd
inget_listing
I do abreak
so it only does it once.
I can't seem to figure out what's going on. Does someone else see it?