1

I'm trying to crawl a .aspx page, but it redirects me to a page which doesn't exist. To solve this, I tried to set 'dont_merge_cookies': True and 'dont_redirect': True, and overwrite my start_requests, but now, it gives me an error "'Response' object has no attribute 'body_as_unicode'" and my response class type is 'scrapy.http.response.Response'.

Here's my code:

class Inon_Spider(BaseSpider):
    name = 'Inon'
    allowed_domains = ['www.shop.inonit.in']

    start_urls = ['http://www.shop.inonit.in/Products/Inonit-Men-Jackets/QUIRK-BOX/Toy-Factory-Jacket---Soldiers/pid-1177471.aspx?Rfs=&pgctl=713619&cid=CU00049295']

     #redirects to http://www.shop.inonit.in/Products/Inonit-Men-Jackets/QUIRK-BOX/Toy-Factory-Jacket---Soldiers/1177471

    def start_requests(self):
        start_urls = ['http://www.shop.inonit.in/Products/Inonit-Men-Jackets/QUIRK-BOX/Toy-Factory-Jacket---Soldiers/pid-1177471.aspx?Rfs=&pgctl=713619&cid=CU00049295']

        for i in start_urls:

            yield Request(i, meta = {
                     'dont_merge_cookies': True,
                     'dont_redirect': True,
                     'handle_httpstatus_list': [302]
                 },callback=self.parse)

    def parse(self, response):
        print "Response %s" %response.__class__

        resp = TextResponse
        item = DealspiderItem()
        hxs = HtmlXPathSelector(resp)

        title = hxs.select('//div[@class="aboutproduct"]/div[@class="container9"]/div[@class="ctl_aboutbrand"]/h1/text()').extract()
        price = hxs.select('//span[@id="ctl00_ContentPlaceHolder1_Price_ctl00_spnWebPrice"]/span[@class="offer"]/span[@id="ctl00_ContentPlaceHolder1_Price_ctl00_lblOfferPrice"]/text()').extract()
        prc = price[0].replace("Rs.  ","")

        description = []
        item['price'] = prc
        item['title'] = title
        item['description'] = description
        item['url'] = response.url
        return item
Steven Almeroth
  • 7,758
  • 2
  • 50
  • 57
user_2000
  • 1,103
  • 3
  • 14
  • 26
  • Have you read https://scrapy.readthedocs.org/en/latest/faq.html#what-s-this-huge-cryptic-viewstate-parameter-used-in-some-forms – Steven Almeroth Mar 18 '13 at 16:44
  • First of all, you should try to replace `resp = TextResponse` with `resp = TextResponse(response.url)`. See http://doc.scrapy.org/en/latest/topics/request-response.html#scrapy.http.TextResponse. – alecxe Mar 18 '13 at 16:46
  • it should be HtmlXPathSelector(response) - the response object passed to that method which is the result of crawling a page – Shane Evans Mar 19 '13 at 10:10

0 Answers0