0

I am trying to find the image xpath for the following page: http://www.spoonsisters.com/product/1032000/38710.html

I can view the image_url via my browser, however when I try finding it via Mechanize:

page = Agent.get("http://www.spoonsisters.com/product/1032000/38710.html")
page.parser.xpath('('//*[@id="main_image"]')')
 => [#<Nokogiri::XML::Element:0x80484c7c name="img" attributes=[#<Nokogiri::XML::Attr:0x80484bdc name="id" value="main_image">, #<Nokogiri::XML::Attr:0x80484bc8 name="src">, #<Nokogiri::XML::Attr:0x80484b8c name="alt" value="Paper Cocktail Napkins - What happens tonight goes on Facebook tomorrow">]>] 

I get 'src'> blank. How do I find the image_url?

Yogzzz
  • 2,735
  • 5
  • 36
  • 56

1 Answers1

0

It's because that image src is being set by javascript when the page loads. If you look at the source and search for "main_image", you'll see the following:

<img id="main_image" src="" alt="Bar Towel - Wine Varietals" />

Mechanize doesn't have the ability to run javascript so it will always be an empty string.

Peter Brown
  • 50,956
  • 18
  • 113
  • 146
  • Side note, a better way to get the image object is with `page.image_with(id: 'main_image')`. Then you have access to all the methods at http://mechanize.rubyforge.org/Mechanize/Page/Image.html – Peter Brown Jul 21 '12 at 03:53
  • Do you have any suggestions on ways to parse JS heavy sites? – Yogzzz Jul 21 '12 at 06:31