0

I am running a scrapy spider that starts by getting an authorization token from the website I am scraping from, using basic requests library. The function for this is called get_security_token(). This token is passed as a header to the scrapy request. The issue is that the token expires after 300 seconds, and then I get a 401 error. Is there anyway for a spider to see the 401 error, run the get_security_token() function again, and then pass the new token on to all future request headers?

import scrapy

class PlayerSpider(scrapy.Spider):
name = 'player'

def start_requests(self):

    urls = ['URL GOES HERE']
    header_data = {'Authorization':'Bearer 72bb65d7-2ff1-3686-837c-61613454928d'}
    for url in urls:
        yield scrapy.Request(url = url, callback = self.parse,headers = header_data)


def parse(self, response):
    yield response.json()
Justin
  • 58
  • 1
  • 8

1 Answers1

1

if it's pure scrapy you can add handle_httpstatus_list = [501] after start_urls and then in you parse method you need to do something like this:

if response.status == 501:
    get_security_token()
Roman
  • 1,883
  • 2
  • 14
  • 26
  • How would I get that new security token to be passed into all future requests, as well as the one that threw the error? – Justin Aug 04 '20 at 14:57
  • once you get token ```get_security_token()```, you can pass those headers into request ```yield scrapy.Request(url=your_url, headers=headers_with_new_token, callback=self.your_callback)```. – Roman Aug 04 '20 at 18:18
  • Thanks, that makes sense. My only question is - the `headers_with_new_token` will be sent in the `scrapy.Request` call within the `parse` method, but how can I make sure that all requests after that also use the `headers_with_new_token`? – Justin Aug 04 '20 at 18:22
  • you can check ```response.headers``` for requests with the updated token – Roman Aug 04 '20 at 18:46
  • I've added some code to illustrate where I'm at. The issue is the 'Authorization':'Bearer xxx' key expires. Maybe with my code you could better describe where I should put that variable so I can change it in runtime? – Justin Aug 05 '20 at 02:45
  • something like this ```def parse(self, response): print(response.request.headers['Authorization']) if response.status == 501: header_data = get_security_token() yield scrapy.Request(url=response.url, callback=self.parse, headers=header_data, dont_filter=True)``` – Roman Aug 05 '20 at 05:48