How to save crawl page link into item using scrapy?

Question

This is my spider page:

rules = (
        Rule(LinkExtractor(allow=r'torrents-details\.php\?id=\d*'), callback='parse_item', follow=True),
    )

    def parse_item(self, response):
        item = MovieNotifyItem()
        item['title'] = response.xpath('//h5[@class="col s12 light center teal darken-3 white-text"]/text()').extract_first()
        item['size'] = response.xpath('//*[@class="torrent-info"]//tr[1]/td[2]/text()').extract_first()
        item['catagory'] = response.xpath('//*[@class="torrent-info"]//tr[2]/td[2]/text()').extract_first()
        yield item

Now I want to save the page link into a item say item['page_link'] which crawled by this code:

rules = (
        Rule(LinkExtractor(allow=r'torrents-details\.php\?id=\d*'), callback='parse_item', follow=True),
    )

How can I do that ? Thanks in advance

score 0 · Accepted Answer · answered Oct 09 '16 at 04:46

0

If I understand correctly, you are looking for the response.url:

def parse_item(self, response):
    item = MovieNotifyItem()
    item['url'] = response.url  # "url" field should be defined for "MovieNotifyItem" Item class
    # ...
    yield item

answered Oct 09 '16 at 04:46

alecxe

462,703
120
1,088
1,195

I've another question, what if i had another rule which crawl to next page, and i want to save that, how can i do that ? @alecxe – Mohib Oct 09 '16 at 04:55
@Mohib I think you can get the `referer` in this case, please see http://stackoverflow.com/questions/12054958/scrapyhow-to-print-request-referrer. – alecxe Oct 09 '16 at 04:59

How to save crawl page link into item using scrapy?

1 Answers1