(I have asked this question on the Scrapy google-group without luck.)
I am trying to log into Facebook using Scrapy. I tried the following in the interactive shell:
I set the headers and created a request as follows:
header_vals={'Accept-Language': ['en'], 'Content-Type': ['application/
x-www-form-urlencoded'], 'Accept-Encoding': ['gzip,deflate'],
'Accept': ['text/html,application/xhtml+xml,application/xml;q=0.9,*/
*;q=0.8'], 'User-Agent': ['Mozilla/5.0 Gecko/20070219 Firefox/
2.0.0.2']}
login_request=Request('https://www.facebook.com/login.php',headers=header_vals)
fetch(login_request)
I get redirected:
2011-08-11 13:54:54+0530 [default] DEBUG: Redirecting (meta refresh)
to <GET https://www.facebook.com/login.php?_fb_noscript=1> from <GET
https://www.facebook.com/login.php>
. . .
[s] request <GET https://www.facebook.com/login.php>
[s] response <200 https://www.facebook.com/login.php?_fb_noscript=1>
I guess it shouldn't be redirected there if I am supplying the right headers?
I still attempt to go ahead and supply login details using the FormRequest as follows:
new_request=FormRequest.from_response(response,formname='login_form',formdata={'email':'...@email.com','pass':'password'},headers=header_vals)
new_request.meta['download_timeout']=180
new_request.meta['redirect_ttl']=30
fetch(new_request) results in:
2011-08-11 14:05:45+0530 [default] DEBUG: Redirecting (meta refresh)
to <GET https://www.facebook.com/login.php?login_attempt=1&_fb_noscript=1>
from <POST https://www.facebook.com/login.php?login_attempt=1>
.
.
[s] response <200 https://www.facebook.com/login.php?login_attempt=1&_fb_noscript=1>
.
What am I missing here? Thanks for any suggestions and help.
I'll add that I've also tried this with a BaseSpider to see if this was a result of the cookies not being passed along in the shell, but it doesn't work there either.
I was able to use Mechanize to log on successfully. Can I take advantage of this to somehow pass cookies on to Scrapy?