1

When I turn on DUPEFILTER_DEBUG, I got:

2016-09-21 01:48:29 [scrapy] DEBUG: Filtered duplicate request: http://www.example.org/example.html>

The problem is, I need to know the duplicate request's referrer to debug the code. How can I debug the referrer?

Aminah Nuraini
  • 18,120
  • 8
  • 90
  • 108
  • Try implementing your own visited log, in memory or in file, by for example using a yielded links pipeline. – Evhz Sep 21 '16 at 08:52

1 Answers1

1

One option would be a custom filter based on the built-in RFPDupeFilter filter:

from scrapy.dupefilters import RFPDupeFilter

class MyDupeFilter(RFPDupeFilter):
    def log(self, request, spider):
        self.logger.debug(request.headers.get("REFERER"), extra={'spider': spider})
        super(MyDupeFilter, self).log(request, spider)

Don't forget to set the DUPEFILTER_CLASS setting to point to your custom class.

(not tested)

Community
  • 1
  • 1
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195