Problem Occurred When I Was Crawled The Whole Website By Using splash To Render The Entire Target Page.Some Page Was Not Random Successfully So I Was False To Get The Information That Supports To Be There When Render Job Had Done.That Means I Just Get Part Of The Information From The Render Result Although I Can Get The Entire Information From Other Render Result.
Here Is My Code:
yield SplashRequest(url,self.splash_parse,args = {"wait": 3,},endpoint="render.html")
settings:
SPLASH_URL = 'XXX'
DOWNLOADER_MIDDLEWARES = {
'scrapy_splash.SplashCookiesMiddleware': 723,
'scrapy_splash.SplashMiddleware': 725,
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware': 810,
}
# Enable SplashDeduplicateArgsMiddleware:
SPIDER_MIDDLEWARES = {
'scrapy_splash.SplashDeduplicateArgsMiddleware': 100,
}
# Set a custom DUPEFILTER_CLASS:
DUPEFILTER_CLASS = 'scrapy_splash.SplashAwareDupeFilter
# a custom cache storage backend:
HTTPCACHE_STORAGE = 'scrapy_splash.SplashAwareFSCacheStorage'