Scrapy and Javascript

Question

I'm trying to scrap some information from this ASP page http://laredoute.fr/ppdp/prod-350007615.aspx mainly the first 4 images in high resolution that load in the image-carousel. Depending on the color of the product you have some buttons that you must select. This is the code I have now that retrieves the buttons and adds them to a list to be clicked on later.

from selenium import webdriver
from scrapy.http import HtmlResponse
from scrapy.spiders import Spider
from scrapy.selector import Selector
from scrapy.selector import HtmlXPathSelector
from scrapy.linkextractors import LinkExtractor
import urllib
import urllib2
from bs4 import BeautifulSoup



class MyOpener(urllib.FancyURLopener):
    version = 'Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.17 (KHTML, like Gecko) Chrome/24.0.1312.57 Safari/537.17'

myopener = MyOpener()
url = 'http://www.laredoute.fr/ppdp/prod-350007615.aspx'

f = myopener.open(url)
soup = BeautifulSoup(f)

viewstate = soup.find("fieldset", class_="set set-colour")


list = []

for elem in viewstate.findAll('label'):
    #print elem['title']
    #print elem['for']
    list.append(elem['for'].strip('\n\t ,'))

#print list


driver = webdriver.Chrome('/Users/vasquez/Documents/crawler/chromedriver')

driver.maximize_window()
driver.get(url)

radio = driver.find_element_by_id(list[0])


driver.execute_script("arguments[0].click();", radio)

Now the part I'm having problems with is this one. That image carousel is here in this part of HTML code.

<div class="divProds jcarousel-clip">

    <ul class="divAddScroller">

    </ul>

</div>

If I open the Developer Tools in Chrome and click over it the whole code appears, if I parse the whole HTML with Scrapy as I done before the code is not there and I can retrieve the img link that I need. This is the part of the HTML that I want to parse.

<li><a href="javascript:void(0)">

<img src="//media.laredoute.com/products1/72by72/d/e/6/350007615_0_PR_1_11970785_350007615-1fca06aa-305f-4b3f-92da-80e8e21cb43a_1200.jpg" data-src="http://media.laredoute.com/|Dimension|/d/e/6/350007615_0_PR_1_11970785_350007615-1fca06aa-305f-4b3f-92da-80e8e21cb43a_1200.jpg" title="Blouse manches longues, transparences, dentelle VERO MODA" alt="Blouse manches longues, transparences, dentelle VERO MODA image 1" width="72" height="72" data-cerberus="img_pdp_thumbnails1" class="">
</a>
</li>

So as a final question, how do I make Scrapy load that javascript:void(0) img link down below. Thanks.

Check out this post: http://stackoverflow.com/questions/8550114/can-scrapy-be-used-to-scrape-dynamic-content-from-websites-that-are-using-ajax. — Alex, Nov 16 '15 at 14:28
I recommend you to check [Splash](https://github.com/scrapinghub/splash), it plays really well with scrapy for javascript rendering. — eLRuLL, Nov 16 '15 at 14:36

score 0 · Accepted Answer · answered Nov 19 '15 at 01:23

0

By using this

sleep(5)
html = driver.execute_script("return document.getElementsByTagName('html')[0].innerHTML")

I was able to retrieve the dynamic rendered page. Now I can process it further and extract the image links I need :)

answered Nov 19 '15 at 01:23

Vasquez Sanchez

23
3

Scrapy and Javascript

1 Answers1