Python, Scrapy, Selenium: how to attach webdriver to "response" passed into a function to use it for further action

Question

I am trying to use Selenium to obtain value of selected option from a drop down list in a scrapy spider, but am unsure of how to go about it. Its my first interaction with Selenium.

As you can see in the code below, I create a request in parse function which calls parse_page function as a callback. In parse_page I want to extract the value of selected option. I cant figure out how to attach webdriver to the response page sent into parse_page to be able to use it in Select. I have written an obviously wrong code below :(

from scrapy.spider import Spider
from scrapy.selector import Selector
from scrapy.http import Request
from scrapy.exceptions import CloseSpider
import logging
import scrapy
from scrapy.utils.response import open_in_browser
from scrapy.http import FormRequest
from scrapy.http import Request
from selenium import webdriver
from selenium.webdriver.support.ui import Select
from activityadvisor.items import TruYog

logging.basicConfig()
logger = logging.getLogger()

class TrueYoga(Spider):
    name = "trueyoga"
    allowed_domains = ["trueyoga.com.sg","trueclassbooking.com.sg"]
    start_urls = [
        "http://trueclassbooking.com.sg/frames/class-schedules.aspx",
    ]

    def parse(self, response):

        clubs=[]
        clubs = Selector(response).xpath('//div[@class="club-selections"]/div/div/div/a/@rel').extract()
        clubs.sort()
        print 'length of clubs = ' , len(clubs), '1st content of clubs = ', clubs
        req=[]
        for club in clubs:
            payload = {'ctl00$cphContents$ddlClub':club}
            req.append(FormRequest.from_response(response,formdata = payload, dont_click=True, callback = self.parse_page))
        for request in req:
            yield request

    def parse_page(self, response):
        driver = webdriver.Firefox()
        driver.get(response)
        clubSelect = Select(driver.find_element_by_id("ctl00_cphContents_ddlClub"))
        option = clubSelect.first_selected_option
        print option.text

Is there any way to obtain this option value in scrapy without using Selenium? My search on google and stackoverflow didn't yield any useful answers so far.

Thanks for help!

score 2 · Answer 1 · edited May 23 '17 at 11:44

2

I would recommend using Downloader Middleware to pass the Selenium response over to your spider's parse method. Take a look at the example I wrote as an answer to another question.

edited May 23 '17 at 11:44

Community

1
1

answered Jul 08 '15 at 12:42

JoeLinux

4,198
1
29
31

score 1 · Answer 2 · answered Jul 08 '15 at 07:43

1

If you get the response there are the select boxes with their options. One of those options has the attribute selected="selected". I think you should go through this attribute to avoid the usage of Selenium:

def parse_page(self, response):
    response.xpath("//select[@id='ctl00_cphContents_ddlClub']//option[@selected = 'selected']").extract()

answered Jul 08 '15 at 07:43

GHajba

3,665
5
25
35

I also realised I could send the club data as a meta-data which is what I used yday to move forward on the code. But, this is very helpful. thanks very much :) – Tuhina Singh Jul 09 '15 at 00:21

Python, Scrapy, Selenium: how to attach webdriver to "response" passed into a function to use it for further action

2 Answers2