Questions tagged [scrapy-middleware]

Scrapy middleware is a framework of hooks into Scrapy’s spider processing mechanism where you can plug custom functionality to process the responses that are sent to Spiders for processing and to process the requests and items that are generated from spiders.

Scrapy also provides some built in Middlewares out of the box for use with your spiders.

23 questions
0
votes
1 answer

How to get new token headers during runtime of scrapy spider

I am running a scrapy spider that starts by getting an authorization token from the website I am scraping from, using basic requests library. The function for this is called get_security_token(). This token is passed as a header to the scrapy…
Justin
  • 58
  • 1
  • 8
0
votes
1 answer

raise IgnoreRequest not working correctly in CustomDownloaderMiddleWare

I have written my own scrapy download middleware to simply check db for exist request.url, if so raise IgnoreRequestf def process_request(self, request, spider): # Called for each request that goes through the downloader #…
user15208009
0
votes
1 answer

How to handle multiple request for a MIDDLEWARE in SCRAPY (captchas y multiple retries)

I'm trying to build a spider who breaks a dynamic captcha with just Scrapy, i have done it BUT of course when i'm breaking the captcha is not always correct, so I HAVE TO make it retry multiple times (max. 10) to really enter the 'login' page for…
AngelLB
  • 153
  • 2
  • 9
0
votes
2 answers

How to retry IndexError in Scrapy

Sometimes I get IndexError because I successfully scrape only half of the page causing the parsing logic to get IndexError. How can I retry when I get IndexError? It's ideally a middleware so it can handle multiple spiders at once.
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
0
votes
1 answer

Override Scrapy logging esp. from middleware

I have used Scrapy in a project where I have my own JSON logging format. I want to avoid any multi-line stacktraces from Scrapy especially from middlewares like the one for robots.txt. I would prefer it to be a proper one line error or the entire…
comiventor
  • 3,922
  • 5
  • 50
  • 77
-1
votes
1 answer

Access Spider self object on custom middleware

I am trying to notice when there is a problem with the page I am scrapping. In case the response has not a valid status code, I want to write a custom value in the crawler stats so that I can return a non-zero exit code from my process. This is what…
Luiscri
  • 913
  • 1
  • 13
  • 40
-1
votes
1 answer

How to get response status code on process_exception in Scrapy?

I want to retry Scrapy request when if it gets an exception and the response status code is 429. The problem is I don't know how to get the response status on the process_exception. How can I do it since it seems there is no way to access response…
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
-2
votes
2 answers

Scrapy - NameError: global name 'logger' is not defined

I am trying to modify Scrapy retry a little bit by modifying the middleware. I use this middleware: class Retry500Middleware(RetryMiddleware): def _retry(self, request, reason, spider): retries = request.meta.get('retry_times', 0) + 1 …
Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
1
2