Questions tagged [crawlera]
26 questions
0
votes
0 answers
Website redirects endlessly until the max-redirection is reached in scrapy
site behaves noramlly when accessed through browser but the redirection issue occurs while accessing the site through scrapy bots.
I use Scrapy-Crawlera proxy services, still site redirects endlessly.
If i use handle_httpstatus_list = [302] or…
0
votes
1 answer
Scrapy crawlera bug
Scrapy 2.0.1, scrapy_crawlera 1.7.0.
I think scrapy_crawlera should access meta differently (https://github.com/scrapy/scrapy/issues/3516)
2020-04-02 06:02:36 [scrapy.core.engine] INFO: Spider opened
2020-04-02 06:02:36 [scrapy.extensions.logstats]…

aikipooh
- 137
- 1
- 19
0
votes
1 answer
Crawlera, cookies, sessions, rate limiting
I'm trying to use scrapinghub to crawl a website that heavily limits request rate.
If I run the spider as-is, I get 429 pretty soon.
If I enable crawlera as per standard instructions, the spider doesn't work anymore.
If I set headers =…

kenshin
- 197
- 11
0
votes
1 answer
How to make the website believe that the request is coming from a browser using Scrapy?
I am trying to scrape this url:
https://www.bloomberg.com/news/articles/2019-06-03/a-tesla-collapse-would-boost-european-carmakers-bernstein-says
I just wanted to scrape title and posted date only but bloomberg always banned man and think that I am…

Christian Read
- 135
- 11
0
votes
0 answers
scrapy-splash response.body contains no html
Im trying to use crawlera alongside splash local instance, this is my lua script
function main(splash)
function use_crawlera(splash)
local user = splash.args.crawlera_user
local host = 'proxy.crawlera.com'
local port = 8010
local…

Farhan Muhammad
- 13
- 5
0
votes
1 answer
Stop Scrapy request pipeline for a few minutes and retry
I am scraping a single domain using Scrapy and Crawlera proxy and sometimes due to Crawlera issues (technical break) and I am getting 407 status code and can't scrape any site. Is it possible to stop request pipeline for 10 minutes and then restart…

Bociek
- 1,195
- 2
- 13
- 28
0
votes
2 answers
Scrapy spider not working with crawlera middleware
I wrote a spider to crawl a large site. im hosting it on scrapehub and am using the crawlera add on. Without crawlera my spider runs on scrapehub just fine. As soon as i switch to crawlera middleware the spider just exits without doing a single…

joe
- 73
- 2
- 8
0
votes
2 answers
Does scrapy-crawlera handle a 429 status code?
Wondering if anyone knows if scrapy-crawlera middleware handles the 429 status code when using scrapy, or if I need to implement my own retry logic?
I can't seem to find it documented anywhere

Kevin Glasson
- 408
- 2
- 13
0
votes
2 answers
How to get session_id when using Crawlera lua script in Scrapy Splash?
As you know, we use this lua script when we try to use Scrapy Splash with Crawlera:
function use_crawlera(splash)
-- Make sure you pass your Crawlera API key in the 'crawlera_user' arg.
-- Have a look at the file spiders/quotes-js.py to see…

Aminah Nuraini
- 18,120
- 8
- 90
- 108
0
votes
1 answer
Scrapy Splash + Crawlera in Linux always get 503 service unavailable error
When I use Scrapy Splash + Crawlera in my Linux server, it always gets 503 errors. It works just fine in Windows. Why is that?

Aminah Nuraini
- 18,120
- 8
- 90
- 108
-1
votes
1 answer
How to authenticate using scrapy spider with Zyte Smart Proxy Manager (former Crawlera) enabled?
I followed the scrapy-zyte-smartproxy documentation to integrate proxy usage into my spider. Now my spider can't log in.

Danil
- 4,781
- 1
- 35
- 50