Trying to find image url via xpath using Mechanize

Question

I am trying to find the image xpath for the following page: http://www.spoonsisters.com/product/1032000/38710.html

I can view the image_url via my browser, however when I try finding it via Mechanize:

page = Agent.get("http://www.spoonsisters.com/product/1032000/38710.html")
page.parser.xpath('('//*[@id="main_image"]')')
 => [#<Nokogiri::XML::Element:0x80484c7c name="img" attributes=[#<Nokogiri::XML::Attr:0x80484bdc name="id" value="main_image">, #<Nokogiri::XML::Attr:0x80484bc8 name="src">, #<Nokogiri::XML::Attr:0x80484b8c name="alt" value="Paper Cocktail Napkins - What happens tonight goes on Facebook tomorrow">]>]

I get 'src'> blank. How do I find the image_url?

score 0 · Accepted Answer · answered Jul 21 '12 at 03:52

0

It's because that image src is being set by javascript when the page loads. If you look at the source and search for "main_image", you'll see the following:

<img id="main_image" src="" alt="Bar Towel - Wine Varietals" />

Mechanize doesn't have the ability to run javascript so it will always be an empty string.

answered Jul 21 '12 at 03:52

Peter Brown

50,956
18
113
146

Side note, a better way to get the image object is with `page.image_with(id: 'main_image')`. Then you have access to all the methods at http://mechanize.rubyforge.org/Mechanize/Page/Image.html – Peter Brown Jul 21 '12 at 03:53
Do you have any suggestions on ways to parse JS heavy sites? – Yogzzz Jul 21 '12 at 06:31

Trying to find image url via xpath using Mechanize

1 Answers1