2

I'm working on get the data from Factiva, in Python 3.5.2. And I have to use school login so that I could see the data.

I have followed this post to try to create login spider

However, I got this error: error

This is my code:

# Test Login Spider
import scrapy
from scrapy.selector import HtmlXPathSelector
from scrapy.http import Request


login_url = "https://login.proxy.lib.sfu.ca/login?qurl=https%3a%2f%2fglobal.factiva.com%2fen%2fsess%2flogin.asp%3fXSID%3dS002sbj1svr2sVo5DEs5DEpOTAvNDAoODZyMHn0YqYvMq382rbRQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQUFBQQAA"
user_name = b"[my_user_name]"
pswd = b"[my_password]"
response_page = "https://global-factiva-com.proxy.lib.sfu.ca/hp/printsavews.aspx?pp=Save&hc=All"


class MySpider(scrapy.Spider):
    name = 'myspider'

    def start_requests(self):
        return [scrapy.FormRequest(login_url,
                               formdata={'user': user_name, 'pass': pswd},
                               callback=self.logged_in)]

    def logged_in(self, response):
        # login failed
        if "authentication failed" in response.body:
            print ("Login failed")
        # login succeeded
        else:
            print ('login succeeded')
            # return Request(url=response_page,
            #        callback=self.parse_responsepage)

    def parse_responsepage(self, response):
        hxs = HtmlXPathSelector(response)
        yum = hxs.select('//span/@enHeadline')


def main():
    test_spider = MySpider(scrapy.Spider)
    test_spider.start_requests()

if __name__ == "__main__":
    main()

In order to run this code, I was using terminal command line in the top directory of the project:

scrapy runspider [my_file_path]/auth_spider.py

Do you know how to deal with the errors here?

Community
  • 1
  • 1
Cherry Wu
  • 3,844
  • 9
  • 43
  • 63

1 Answers1

3

As you're using Python 3.x, "authentication failed" is a str while response.body is of type bytes.

To resolve the issue, either perform the test in str:

if "authentication failed" in response.body_as_unicode():

or in bytes:

if b"authentication failed" in response.body:
starrify
  • 14,307
  • 5
  • 33
  • 50
  • Oh my god, it shows login success. I thought I could never solve this problem.... Thank you very much!! – Cherry Wu Nov 08 '16 at 03:56