Can't fetch url in scrapy shell with splash

Question

Please help me!

When I try to fetch a URL in scrapy shell with scrapy splash, I use the following statement to get a response: >>> fetch('http://localhost:8050/render.html?url=https://www.barbiermotorsport.nl/motoren')

So far I'm not getting a response back, it even freezes scrapy splash (http://localhost:8050/, not reachable anymore in chrome).

When I try a different URL, it works: `>>> fetch('http://localhost:8050/render.html?url=https://amtmotors.nl/motoren') 2023-03-14 06:15:21 [scrapy.core.engine] INFO: Spider opened

response
<200 http://localhost:8050/render.html?url=https://amtmotors.nl/motoren>`

in settings.py: ROBOTSTXT_OBEY = False

And I also tried several different headers / user-agents. I guess it's stuck on getting a response because of Google Recaptcha, are there any solutions to this?

I was expecting a response or at least a timeout.

score 0 · Answer 1 · answered Mar 15 '23 at 15:11

Fixed it, it's caused by recaptcha__en.js, filtered it out with the following lua_script:

function main(splash, args)
    splash:on_request(function(request)
        if request.url:find('recaptcha__en') ~= nil then
            request:abort()
        end
    end)
    assert(splash:go(args.url))
    assert(splash:wait(0.5))
    return {
        html = splash:html(),
        png = splash:png(),
        har = splash:har(),
    }
end

So if I render it through http://localhost:8050/, it works.

See image

Can't fetch url in scrapy shell with splash

1 Answers1