When the javascript is loaded, it makes a another ajax request where cookies should be set in the response. However, Splash does not keep any cookies across multiple requests, is there a way to keep the cookies across all requests? Or even assign them manually between each requests.
Asked
Active
Viewed 6,538 times
1 Answers
4
Yes, there is an example in scrapy-splash README - see Session Handling section. In short, first, make sure that all settings are correct. Then use SplashRequest(url, endpoint='execute', args={'lua_source': script})
to send scrapy requests. Rendering script should be like this:
function main(splash)
splash:init_cookies(splash.args.cookies)
-- ... your script
return {
cookies = splash:get_cookies(),
-- ... other results, e.g. html
}
end
There is also a complete example with cookie handling, header handling, etc. in scrapy-splash README - see a last example here.

Mikhail Korobov
- 21,908
- 8
- 73
- 65
-
Thanks for the help Mikhail, what happens when I need to set cookies for calls made in the javascript, 4 different requests happen when i do `splash:go(url)`, I would like to set cookies after the second request – James Samovar Nov 11 '16 at 20:04
-
Sorry, I don't quite understand the question. Cookies received in AJAX responses should be merged to Splash cookiejar and returned in splash:get_cookies(). splash:init_cookies() sets content of a browser cookiejar, browser should use these cookies for all requests, including AJAX requests. So the script above should work regardless of how many requests you're making in your Lua script. – Mikhail Korobov Nov 11 '16 at 20:52
-
Oh I understand now, so I guess the problem is not with the cookies. I'm basically trying to access Crunchbase.com through Splash, they have some weird bot protection. Accessing from a browser always works. Do you have any idea of how to make Splash's behavior exactly like a browser's? – James Samovar Nov 11 '16 at 21:08
-
Splash works like a browser, but a rather old one; it uses almost the same rendering engine as PhantomJS 2.0 - it is WebKit from 2013. It is possible to detect this engine using e.g. engine-specific bugs and gotchas, or using its missing features. It also sets user agent which can be identified (you can set your own though). – Mikhail Korobov Nov 11 '16 at 21:16
-
I see, appreciate your help! – James Samovar Nov 11 '16 at 22:06