Problem logging into Facebook with Scrapy

Question

(I have asked this question on the Scrapy google-group without luck.)

I am trying to log into Facebook using Scrapy. I tried the following in the interactive shell:

I set the headers and created a request as follows:

header_vals={'Accept-Language': ['en'], 'Content-Type': ['application/ 
x-www-form-urlencoded'], 'Accept-Encoding': ['gzip,deflate'], 
'Accept': ['text/html,application/xhtml+xml,application/xml;q=0.9,*/ 
*;q=0.8'], 'User-Agent': ['Mozilla/5.0 Gecko/20070219 Firefox/ 
2.0.0.2']}

login_request=Request('https://www.facebook.com/login.php',headers=header_vals) 

fetch(login_request)

I get redirected:

2011-08-11 13:54:54+0530 [default] DEBUG: Redirecting (meta refresh) 
to <GET https://www.facebook.com/login.php?_fb_noscript=1> from <GET 
https://www.facebook.com/login.php>

. . .

[s]   request    <GET https://www.facebook.com/login.php> 

[s]   response   <200 https://www.facebook.com/login.php?_fb_noscript=1>

I guess it shouldn't be redirected there if I am supplying the right headers?

I still attempt to go ahead and supply login details using the FormRequest as follows:

new_request=FormRequest.from_response(response,formname='login_form',formdata={'email':'...@email.com','pass':'password'},headers=header_vals)

new_request.meta['download_timeout']=180 

new_request.meta['redirect_ttl']=30

fetch(new_request) results in:

2011-08-11 14:05:45+0530 [default] DEBUG: Redirecting (meta refresh) 
to <GET https://www.facebook.com/login.php?login_attempt=1&_fb_noscript=1> 
from <POST https://www.facebook.com/login.php?login_attempt=1>
.
.

[s]   response   <200 https://www.facebook.com/login.php?login_attempt=1&_fb_noscript=1>

.

What am I missing here? Thanks for any suggestions and help.

I'll add that I've also tried this with a BaseSpider to see if this was a result of the cookies not being passed along in the shell, but it doesn't work there either.

I was able to use Mechanize to log on successfully. Can I take advantage of this to somehow pass cookies on to Scrapy?

score 1 · Accepted Answer · answered Aug 20 '11 at 22:13

1

Notice that "meta redirect" text near redirecting. Facebook has a noscript tag to automatically redirect clients without javascript to "/login.php?_fb_noscript=1". The problem is that you're posting to "/login.php" instead and always getting redirected by meta refresh header.

Even if you get over this problem it's against Facebook robots.txt, so you shouldn't really be doing this.

Why don't you just use Facebook Graph API?

answered Aug 20 '11 at 22:13

Seb

17,141
7
38
27

Thanks. That makes a lot of sense, and I'll look at the Graph API. – Cygorger Aug 21 '11 at 14:37

Problem logging into Facebook with Scrapy

1 Answers1