1

I want to scrape the list of offers of a given product from amazon.com with the quantity in stoke for each offer. To find this last information (quantity) I need to add that offer to cart, than edit the cart with the quantity 999. and than get the quantity from the next page.

take for example this product (http://www.amazon.com/gp/offer-listing/B00DW58ENU/ref=olp_tab_all&startIndex=1) the button Add to Cart is a form with a single submit. I can find the form with the code

offer_form = agent.page.forms_with(action: /item-dispatch/)[0]
##<Mechanize::Form
# {name nil}
# {method "POST"}
#{action "/gp/item-dispatch/ref=olp_atc_new_1/181-7026511-7466349"}
# {fields
#  [hidden:0x1717018 type: hidden name: session-id value: 181-7026511-7466349]
#  [hidden:0x1716a8c type: hidden name: qid value: ]
#  [hidden:0x1716514 type: hidden name: sr value: ]
#  [hidden:0x1715eac type: hidden name: signInToHUC value: 0]
#  [hidden:0x17155d8 type: hidden name: metric-asin.B00NHQFA1I value: 1]
#  [hidden:0x1714e80 type: hidden name: registryItemID.1 value: ]
#  [hidden:0x1714a0c type: hidden name: registryID.1 value: ]
#  [hidden:0x1714598 type: hidden name: quantity.1 value: 1]
#  [hidden:0x1714138 type: hidden name: offeringID.1 value: RVm%2FgzxznRorTyxf%2F8fiGjVFjfScvgO1JJBElusLb7hLttElaCwmvhKe7NSGkE1LBMGmkM3oodMhTTBnKT%2FCP%2FnFeT7SBoLZdnRfmVwRFa0N7AHRTVnphw%3D%3D]
#  [hidden:0x1707938 type: hidden name: isAddon value: 0]}
# {radiobuttons}
# {checkboxes}
# {file_uploads}
# {buttons
#  [submit:0x17072e4 type: submit name: submit.addToCart value: Add to #cart]}>

page = offer_form.submit
##<Mechanize::Page
# {url #<URI::HTTP http://www.amazon.com/gp/item-dispatch/ref=olp_atc_new_1>}
# {meta_refresh}
# {title nil}
# {iframes}
# {frames}
# {links}
# {forms}>

I am wondering why I got this empty page as a result.

I though that maybe this is because the action is different than the real one found when I open the page using the browser (Chrome of Firefox). but even if I change the offer_form.action to be like that found on the browser. It does not change the result, and I still get an empty page.

Nafaa Boutefer
  • 2,169
  • 19
  • 26
  • 1
    Odds are good you are running into DHTML, which Mechanize can't handle. Turn off JavaScript in your browser and try navigating like you want Mechanize to do, and watch the source of the page and see if it's what Mechanize is seeing. If so, you'll need to use something like Watir, which can drive a browser and allow you to interact with a DHTML site. In the long run you're better off not using scraping on Amazon though. You could get banned as you're most likely violating their TOS. Instead try using their API which will require less maintenance and offer higher bandwidth. – the Tin Man May 14 '15 at 18:05
  • @theTinMan I found what the problem is, I'll put the answer later on. Thank you for your help. – Nafaa Boutefer May 14 '15 at 18:18

0 Answers0