Highest Voted 'crawlera' Questions

0

votes

0 answers

Website redirects endlessly until the max-redirection is reached in scrapy

site behaves noramlly when accessed through browser but the redirection issue occurs while accessing the site through scrapy bots. I use Scrapy-Crawlera proxy services, still site redirects endlessly. If i use handle_httpstatus_list = [302] or…

asked Apr 29 '20 at 06:15

Shashikiran Neelakantaiah

77
6

0

votes

1 answer

Scrapy crawlera bug

Scrapy 2.0.1, scrapy_crawlera 1.7.0. I think scrapy_crawlera should access meta differently (https://github.com/scrapy/scrapy/issues/3516) 2020-04-02 06:02:36 [scrapy.core.engine] INFO: Spider opened 2020-04-02 06:02:36 [scrapy.extensions.logstats]…

scrapy crawlera

asked Apr 02 '20 at 06:11

aikipooh

137
1
19

0

votes

1 answer

Crawlera, cookies, sessions, rate limiting

I'm trying to use scrapinghub to crawl a website that heavily limits request rate. If I run the spider as-is, I get 429 pretty soon. If I enable crawlera as per standard instructions, the spider doesn't work anymore. If I set headers =…

scrapy scrapinghub crawlera

asked Sep 09 '19 at 12:47

kenshin

197
11

0

votes

1 answer

How to make the website believe that the request is coming from a browser using Scrapy?

I am trying to scrape this url: https://www.bloomberg.com/news/articles/2019-06-03/a-tesla-collapse-would-boost-european-carmakers-bernstein-says I just wanted to scrape title and posted date only but bloomberg always banned man and think that I am…

python web-scraping scrapy crawlera

asked Jun 05 '19 at 00:22

Christian Read

135
11

0

votes

0 answers

scrapy-splash response.body contains no html

Im trying to use crawlera alongside splash local instance, this is my lua script function main(splash) function use_crawlera(splash) local user = splash.args.crawlera_user local host = 'proxy.crawlera.com' local port = 8010 local…

scrapy scrapy-splash crawlera

asked Mar 17 '19 at 15:04

Farhan Muhammad

13
5

0

votes

1 answer

Stop Scrapy request pipeline for a few minutes and retry

I am scraping a single domain using Scrapy and Crawlera proxy and sometimes due to Crawlera issues (technical break) and I am getting 407 status code and can't scrape any site. Is it possible to stop request pipeline for 10 minutes and then restart…

web-scraping scrapy crawlera

asked Feb 16 '19 at 11:25

Bociek

1,195
2
13
28

0

votes

2 answers

Scrapy spider not working with crawlera middleware

I wrote a spider to crawl a large site. im hosting it on scrapehub and am using the crawlera add on. Without crawlera my spider runs on scrapehub just fine. As soon as i switch to crawlera middleware the spider just exits without doing a single…

scrapy crawlera

asked Feb 06 '19 at 10:15

joe

73
2
8

0

votes

2 answers

Does scrapy-crawlera handle a 429 status code?

Wondering if anyone knows if scrapy-crawlera middleware handles the 429 status code when using scrapy, or if I need to implement my own retry logic? I can't seem to find it documented anywhere

python web-scraping scrapy crawlera

asked Dec 24 '18 at 07:11

Kevin Glasson

408
2
13

0

votes

2 answers

How to get session_id when using Crawlera lua script in Scrapy Splash?

As you know, we use this lua script when we try to use Scrapy Splash with Crawlera: function use_crawlera(splash) -- Make sure you pass your Crawlera API key in the 'crawlera_user' arg. -- Have a look at the file spiders/quotes-js.py to see…

python lua scrapy scrapy-splash crawlera

asked Nov 27 '18 at 15:13

Aminah Nuraini

18,120
8
90
108

0

votes

1 answer

Scrapy Splash + Crawlera in Linux always get 503 service unavailable error

When I use Scrapy Splash + Crawlera in my Linux server, it always gets 503 errors. It works just fine in Windows. Why is that?

web-scraping scrapy scrapy-splash splash-js-render crawlera

asked Oct 18 '18 at 22:19

Aminah Nuraini

18,120
8
90
108

-1

votes

1 answer

How to authenticate using scrapy spider with Zyte Smart Proxy Manager (former Crawlera) enabled?

I followed the scrapy-zyte-smartproxy documentation to integrate proxy usage into my spider. Now my spider can't log in.

python scrapy crawlera

asked Aug 18 '21 at 18:57

Danil

4,781
1
35
50

Questions tagged [crawlera]