1

I am trying to get 404 error in scrapy.Following is my code. But i donot understand how to get response.status code 404 in python.

name="HalfScrap"
allowed_domains=["www.sample.co.uk"]
start_urls=["https://www.sample.co.uk/Products",]


def parse(self,response):
    cursor=self.con.cursor()    

    cursor.execute("Select top 1 Url from Category where Site=?",('Sample'))
    rows=cursor.fetchall()
    for row in rows:
        url="https://www.sample.co.uk/P/Components/system.com/1234"+"?x=12&p_style=list&p_productsPerPage=2000"
        yield Request(url,callback=self.HalfProduct)        

def HalfProduct(self,response):
    if response.status=='404':
        print "statusCode=",response.status     
    try:
        sel=Selector(response)
        rows=sel.xpath('//table[@class="listTable"]/tr[starts-with(@class,"listTableTr")]') 
        len(rows)       
        Items=[]
syyed
  • 55
  • 1
  • 7
  • Have you seen [this](http://stackoverflow.com/questions/15865611/checking-a-url-for-a-404-error-scrapy) answer. They are using `response.getcode()` to see if it equals 404. – nbryans Jun 20 '16 at 16:22
  • Thanks for your reply. Yes i tried it, but getting an error. AttributeError: Failure instance has no attribute 'getcode' – syyed Jun 20 '16 at 16:35
  • please check the documentation, the status attribute is of type integer: http://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Response – eLRuLL Jun 20 '16 at 16:58
  • @syyed, you may want to use errbacks for this. Check http://doc.scrapy.org/en/latest/topics/request-response.html#using-errbacks-to-catch-exceptions-in-request-processing – paul trmbrth Jun 20 '16 at 17:12
  • Thanks all for your kind help. I solved the issue using handle_httpstatus_list = [404] – syyed Jun 21 '16 at 16:59
  • In latest version of scrapy you have to allow any non 2xx code to get into your callback function. https://doc.scrapy.org/en/latest/topics/spider-middleware.html#std:setting-HTTPERROR_ALLOWED_CODES – Raheel Jul 17 '17 at 11:02

0 Answers0