I tried to extract some data from dynamically loaded javascript website using scrapy-playwright
but I stuck at the very beginning.
From where I'm facing trubles in settings.py file is as follows:
#playwright
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
#TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
#ASYNCIO_EVENT_LOOP = 'uvloop.Loop'
When I inject the following scrapy-playwright hanndler:
DOWNLOAD_HANDLERS = {
"http": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
"https": "scrapy_playwright.handler.ScrapyPlaywrightDownloadHandler",
}
Then I got:
scrapy.exceptions.NotSupported: Unsupported URL scheme 'https': The installed reactor
(twisted.internet.selectreactor.SelectReactor) does not match the requested one (twisted.internet.asyncioreactor.AsyncioSelectorReactor)
When I inject TWISTED_REACTOR"
TWISTED_REACTOR = 'twisted.internet.asyncioreactor.AsyncioSelectorReactor'
Then I got:
raise TypeError(
TypeError: SelectorEventLoop required, instead got: <ProactorEventLoop running=False closed=False debug=False>
After all,When I inject ASYNCIO_EVENT_LOOP
Then I got:
ModuleNotFoundError: No module named 'uvloop'
At last, fail to install 'uvloop'
pip install uvloop
Script
import scrapy
from scrapy_playwright.page import PageCoroutine
class ProductSpider(scrapy.Spider):
name = 'product'
def start_requests(self):
yield scrapy.Request(
'https://shoppable-campaign-demo.netlify.app/#/',
meta={
'playwright': True,
'playwright_include_page': True,
'playwright_page_coroutines': [
PageCoroutine("wait_for_selector", "div#productListing"),
]
}
)
async def parse(self, response):
pass
# parses content