1

This is my spider page:

rules = (
        Rule(LinkExtractor(allow=r'torrents-details\.php\?id=\d*'), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        item = MovieNotifyItem()
        item['title'] = response.xpath('//h5[@class="col s12 light center teal darken-3 white-text"]/text()').extract_first()
        item['size'] = response.xpath('//*[@class="torrent-info"]//tr[1]/td[2]/text()').extract_first()
        item['catagory'] = response.xpath('//*[@class="torrent-info"]//tr[2]/td[2]/text()').extract_first()
        yield item

Now I want to save the page link into a item say item['page_link'] which crawled by this code:

rules = (
        Rule(LinkExtractor(allow=r'torrents-details\.php\?id=\d*'), callback='parse_item', follow=True),
    )

How can I do that ? Thanks in advance

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Mohib
  • 429
  • 1
  • 9
  • 25

1 Answers1

0

If I understand correctly, you are looking for the response.url:

def parse_item(self, response):
    item = MovieNotifyItem()
    item['url'] = response.url  # "url" field should be defined for "MovieNotifyItem" Item class
    # ...
    yield item
alecxe
  • 462,703
  • 120
  • 1,088
  • 1,195
  • I've another question, what if i had another rule which crawl to next page, and i want to save that, how can i do that ? @alecxe – Mohib Oct 09 '16 at 04:55
  • @Mohib I think you can get the `referer` in this case, please see http://stackoverflow.com/questions/12054958/scrapyhow-to-print-request-referrer. – alecxe Oct 09 '16 at 04:59