0

QUESTION

  1. I'm trying to run a Selenium Scrapy scraper in headless mode (code below)
  2. Scraper worked properly in 'headful' mode, ie with opening the Chrome browser
  3. When I add the instructions from here and run again, the scraper runs as if nothing changed. I.e., it runs like before, and opens Chrome
  4. Working on Windows machine. Chrome browser version 111

What should I change to make it run in headless mode? All suggestion are much appreciated. Thank you!!

CODE

import scrapy
from scrapy_selenium import SeleniumRequest
import gspread
import scrapy 
from selenium import webdriver 
from selenium.webdriver.chrome.options import Options

ChromeOptions options = new ChromeOptions()
options.addArguments("--headless")

gc = gspread.service_account(filename = 'credentials2.json')
sh = gc.open_by_key('api_key').sheet1

class QuoteItem(scrapy.Item):
    # define the fields for your item here like:
    text = scrapy.Field()
    author = scrapy.Field()
    tags = scrapy.Field()

class QuotesSpider(scrapy.Spider):
    name = 'techleapsesc'

    def start_requests(self):
        url = 'https://finder.techleap.nl/investors.accelerators'
        yield SeleniumRequest(url=url, callback=self.parse, wait_time= 3)

    def parse(self, response):
        print("Line 24 - inside parse function")
        quote_item = QuoteItem()
        
        print("Line 27 - before for loop")
        print(response.css('div'))
        for quote in response.css('div.quote'):
            quote_item['text'] = quote.css('span.text::text').get()
            quote_item['author'] = quote.css('small.author::text').get()
            quote_item['tags'] = quote.css('div.tags a.tag::text').getall()
            self.sh.append_row(list(quote.values()))
            print(quote)
            yield quote_item

Tried different ways of invoking headless mode

Graves
  • 29
  • 4

1 Answers1

0

Change the below (Java code):

ChromeOptions options = new ChromeOptions()
options.addArguments("--headless")

To (Python code):

options= webdriver.ChromeOptions()
options.add_argument("--headless")
Shawn
  • 4,064
  • 2
  • 11
  • 23