I'm trying to scrap some information from this ASP page http://laredoute.fr/ppdp/prod-350007615.aspx mainly the first 4 images in high resolution that load in the image-carousel. Depending on the color of the product you have some buttons that you must select. This is the code I have now that retrieves the buttons and adds them to a list to be clicked on later.
from selenium import webdriver
from scrapy.http import HtmlResponse
from scrapy.spiders import Spider
from scrapy.selector import Selector
from scrapy.selector import HtmlXPathSelector
from scrapy.linkextractors import LinkExtractor
import urllib
import urllib2
from bs4 import BeautifulSoup
class MyOpener(urllib.FancyURLopener):
version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'
myopener = MyOpener()
url = 'http://www.laredoute.fr/ppdp/prod-350007615.aspx'
f = myopener.open(url)
soup = BeautifulSoup(f)
viewstate = soup.find("fieldset", class_="set set-colour")
list = []
for elem in viewstate.findAll('label'):
#print elem['title']
#print elem['for']
list.append(elem['for'].strip('\n\t ,'))
#print list
driver = webdriver.Chrome('/Users/vasquez/Documents/crawler/chromedriver')
driver.maximize_window()
driver.get(url)
radio = driver.find_element_by_id(list[0])
driver.execute_script("arguments[0].click();", radio)
Now the part I'm having problems with is this one. That image carousel is here in this part of HTML code.
<div class="divProds jcarousel-clip">
<ul class="divAddScroller">
</ul>
</div>
If I open the Developer Tools in Chrome and click over it the whole code appears, if I parse the whole HTML with Scrapy as I done before the code is not there and I can retrieve the img link that I need. This is the part of the HTML that I want to parse.
<li><a href="javascript:void(0)">
<img src="//media.laredoute.com/products1/72by72/d/e/6/350007615_0_PR_1_11970785_350007615-1fca06aa-305f-4b3f-92da-80e8e21cb43a_1200.jpg" data-src="http://media.laredoute.com/|Dimension|/d/e/6/350007615_0_PR_1_11970785_350007615-1fca06aa-305f-4b3f-92da-80e8e21cb43a_1200.jpg" title="Blouse manches longues, transparences, dentelle VERO MODA" alt="Blouse manches longues, transparences, dentelle VERO MODA image 1" width="72" height="72" data-cerberus="img_pdp_thumbnails1" class="">
</a>
</li>
So as a final question, how do I make Scrapy load that javascript:void(0) img link down below. Thanks.