3

i am trying to access a website Using Scrapy-Splash but i get error 405 Ignoring response <405 https://www.controller.com/>: HTTP status code is not handled or not allowed

The Code i use

import scrapy
from scrapy_splash import SplashRequest

class ProxySpider(scrapy.Spider):
    name = "proxyss"

    def start_requests(self):
        urls = [
            'https://controller.com/',
        ]
        for url in urls:
             yield SplashRequest("https://www.controller.com/listings/aircraft/for-sale/list", self.parse,args={"http_method":'GET','wait': 5,'proxy': 'http://xxxxxxxxxx'})

    def parse(self, response):
        page = response.url.split("/")[-2]
        filename = 'proxy.html'
        with open(filename, 'wb') as f:
            f.write(response.body)
        self.log('Saved file %s' % filename)

LOGS

2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com> (failed 1 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com/listings/aircraft/for-sale/list> (failed 1 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com> (failed 2 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com/listings/aircraft/for-sale/list> (failed 2 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com> (failed 3 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET https://www.controller.com/listings/aircraft/for-sale/list> (failed 3 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.controller.com> (failed 4 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://www.controller.com> (referer: https://www.controller.com/listings/aircraft/for-sale/list)
2020-08-17 21:30:55 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET https://www.controller.com/listings/aircraft/for-sale/list> (failed 4 times): 405 Method Not Allowed
2020-08-17 21:30:55 [scrapy.core.engine] DEBUG: Crawled (405) <GET https://www.controller.com/listings/aircraft/for-sale/list> (referer: https://www.controller.com/listings/aircraft/for-sale/list)
2020-08-17 21:30:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://www.controller.com>: HTTP status code is not handled or not allowed
2020-08-17 21:30:56 [scrapy.spidermiddlewares.httperror] INFO: Ignoring response <405 https://www.controller.com/listings/aircraft/for-sale/list>: HTTP status code is not handled or not allowed
2020-08-17 21:30:56 [scrapy.core.engine] INFO: Closing spider (finished)
2020-08-17 21:30:56 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
Muhammad Zeeshan
  • 4,591
  • 3
  • 11
  • 20
  • And you didn't expend the energy to look up that [405 is Method Not Allowed](https://developer.mozilla.org/en-US/docs/Web/HTTP/Status/405)? – mdaniel Aug 15 '20 at 01:24
  • @mdaniel its working on home page but not on internal pages – Muhammad Zeeshan Aug 15 '20 at 11:06
  • Its related sometimes to http request to endpoints that allow only https – AnGG Aug 17 '20 at 21:35
  • @AnGG i have made the requests to HTTPS – Muhammad Zeeshan Aug 17 '20 at 21:42
  • Is your question why you are getting an HTTP 405 from the server, or is your question how to tell Scrapy to handle the HTTP 405? You are getting 2 levels of error here, one from the HTTP server and one from scrapy. Technically scrapy is just issuing an INFO log event, but that log event is what you referenced in your post. – kerasbaz Aug 20 '20 at 08:33
  • @kerasbaz simple question is that i am unable to scrape data using Scrapy-Splash and getting 405 i just want to be able to get html from the page – Muhammad Zeeshan Aug 23 '20 at 11:46

1 Answers1

1

Could just be a retry issue. Add this to your settings.py file and see if it helps:

RETRY_ENABLED = True
RETRY_TIMES = 3
RETRY_HTTP_CODES = [405]
ta_duke
  • 49
  • 1
  • 8