Scrapy shell with playwright

Question

Is it possible to invoke Playwright in a Scrapy shell?

I would like to use a shell to test my xpaths, which I intend to place in a spider that incorporates Scrapy Playwright.

My scrapy settings file has the usual Playwright setup:

# Scrapy Playwright Setup
DOWNLOAD_HANDLERS = {
    "http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
    "https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}

TWISTED_REACTOR = "twisted.internet.asyncioreactor.AsyncioSelectorReactor"

score 2 · Answer 1 · answered Jul 21 '22 at 03:08

Yes, It is possible. In fact, all you have to do is just running scrapy shell inside a folder that contains a scrapy project. It will automatically load all the default settings from settings.py. You can see it on the logs when running scrapy shell.

Also, You can override settings using the -s parameters.

scrapy shell -s DOWNLOAD_HANDLERS='<<your custom handlers>>'

Happy Scraping :)

score 0 · Answer 2 · answered Jul 20 '22 at 10:00

I believe the shell command might not be possible to do with scrapy playwright. Here i am using python3 as demonstration:

This documentation link should help you further: https://playwright.dev/python/docs/intro#interactive-mode-repl

I believe instead of shell you just need python3 or python3 in interactive mode. This way you have auto complete which the scrapy shell never did.

Here is the synchronous example in a file called spider_interactive.py:

from playwright.sync_api import sync_playwright
playwright = sync_playwright().start()
browser = playwright.firefox.launch()
page = browser.new_page()
page.goto("http://whatsmyuseragent.org/")

#Remember to run these manually when your done to prevent left over garbage on the machine.
#browser.close()
#playwright.stop()

Run with:

python3 -i spider_interactive.py

Then you can enter for example the following command:

page.locator("p.intro-text").all_inner_texts()

returns

['Mozilla/5.0 (X11; Linux x86_64; rv:100.0) Gecko/20100101 Firefox/100.0', 'My IP Address: your_ip_address_here]

Scrapy shell with playwright

2 Answers2