2

On premise that the spider can work in a right way, parse function's normal, but I just found a little bunch of response body randomly null when response status code is 200, like 2 out of 10 are body-null. At the same time i use Chrome to check this request url and it's guaranteed that the opened page is good. Plus, i'm pretty sure my ip isn't banned, everything looks normal.

Here's the setting:

BOT_NAME = 'CategorySpider'
SPIDER_MODULES = ['CategorySpider.spiders']
NEWSPIDER_MODULE = 'CategorySpider.spiders'
ROBOTSTXT_OBEY = False
SPIDER_MIDDLEWARES = {
    'CategorySpider.middlewares.NodeMiddlewares': 100,
    'CategorySpider.middlewares.CategoryspiderSpiderMiddleware': 543,

}

'DEFAULT_REQUEST_HEADERS': {
            "accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
            "accept-encoding": "gzip, deflate, sdch, br",
            "accept-language": "zh-CN,zh;q=0.8",
            "upgrade-insecure-requests": 1,

        },

AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_MAX_DELAY = 60

Anyone help me out with this? Thanks a lot

Slimane amiar
  • 934
  • 12
  • 27
changezyc
  • 21
  • 3
  • 1
    I think a minimum reproducible example would increase your chances of getting a useful answer. Show us some reproducible code – KenHBS Aug 31 '18 at 06:39
  • That's like,all the settings above, cause i just made a spider demo without using other complicated code or something – changezyc Aug 31 '18 at 10:16

0 Answers0