On premise that the spider can work in a right way, parse function's normal, but I just found a little bunch of response body randomly null when response status code is 200, like 2 out of 10 are body-null. At the same time i use Chrome to check this request url and it's guaranteed that the opened page is good. Plus, i'm pretty sure my ip isn't banned, everything looks normal.
Here's the setting:
BOT_NAME = 'CategorySpider'
SPIDER_MODULES = ['CategorySpider.spiders']
NEWSPIDER_MODULE = 'CategorySpider.spiders'
ROBOTSTXT_OBEY = False
SPIDER_MIDDLEWARES = {
'CategorySpider.middlewares.NodeMiddlewares': 100,
'CategorySpider.middlewares.CategoryspiderSpiderMiddleware': 543,
}
'DEFAULT_REQUEST_HEADERS': {
"accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8",
"accept-encoding": "gzip, deflate, sdch, br",
"accept-language": "zh-CN,zh;q=0.8",
"upgrade-insecure-requests": 1,
},
AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_MAX_DELAY = 60
Anyone help me out with this? Thanks a lot