36

I'm using a loop to generate my requests inside start_request() and I'd like to pass the index to parse() so it can store it in the item. However when I use self.i the output has the i max value (last loop turn) for every items. I can use response.url.re('regex to extract the index') but I wonder if there is a clean way to pass a variable from start_requests to parse.

ChiseledAbs
  • 1,963
  • 6
  • 19
  • 33

2 Answers2

57

You can use scrapy.Request meta attribute:

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'

    def start_requests(self):
        urls = [...]
        for index, url in enumerate(urls):
            yield scrapy.Request(url, meta={'index':index})

    def parse(self, response):
        print(response.url)
        print(response.meta['index'])
Granitosaurus
  • 20,530
  • 5
  • 57
  • 82
9

You can pass cb_kwargs argument to scrapy.Request()

https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.cb_kwargs

import scrapy

class MySpider(scrapy.Spider):
    name = 'myspider'

    def start_requests(self):
        urls = [...]
        for index, url in enumerate(urls):
            yield scrapy.Request(url, callback=self.parse, cb_kwargs={'index':index})

    def parse(self, response, index):
        pass
jay padaliya
  • 624
  • 6
  • 12