QUESTION
- I'm trying to run a Selenium Scrapy scraper in headless mode (code below)
- Scraper worked properly in 'headful' mode, ie with opening the Chrome browser
- When I add the instructions from here and run again, the scraper runs as if nothing changed. I.e., it runs like before, and opens Chrome
- Working on Windows machine. Chrome browser version 111
What should I change to make it run in headless mode? All suggestion are much appreciated. Thank you!!
CODE
import scrapy
from scrapy_selenium import SeleniumRequest
import gspread
import scrapy
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
ChromeOptions options = new ChromeOptions()
options.addArguments("--headless")
gc = gspread.service_account(filename = 'credentials2.json')
sh = gc.open_by_key('api_key').sheet1
class QuoteItem(scrapy.Item):
# define the fields for your item here like:
text = scrapy.Field()
author = scrapy.Field()
tags = scrapy.Field()
class QuotesSpider(scrapy.Spider):
name = 'techleapsesc'
def start_requests(self):
url = 'https://finder.techleap.nl/investors.accelerators'
yield SeleniumRequest(url=url, callback=self.parse, wait_time= 3)
def parse(self, response):
print("Line 24 - inside parse function")
quote_item = QuoteItem()
print("Line 27 - before for loop")
print(response.css('div'))
for quote in response.css('div.quote'):
quote_item['text'] = quote.css('span.text::text').get()
quote_item['author'] = quote.css('small.author::text').get()
quote_item['tags'] = quote.css('div.tags a.tag::text').getall()
self.sh.append_row(list(quote.values()))
print(quote)
yield quote_item
Tried different ways of invoking headless mode