I'm using a loop to generate my requests inside start_request()
and I'd like to pass the index to parse()
so it can store it in the item. However when I use self.i
the output has the i
max value (last loop turn) for every items. I can use response.url.re('regex to extract the index')
but I wonder if there is a clean way to pass a variable from start_requests to parse.
Asked
Active
Viewed 1.1k times
36

ChiseledAbs
- 1,963
- 6
- 19
- 33
2 Answers
57
You can use scrapy.Request
meta
attribute:
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
def start_requests(self):
urls = [...]
for index, url in enumerate(urls):
yield scrapy.Request(url, meta={'index':index})
def parse(self, response):
print(response.url)
print(response.meta['index'])

Granitosaurus
- 20,530
- 5
- 57
- 82
-
Thanks it works for me, now i can create dynamic csv files to store data. – Hemant Kumar May 13 '19 at 07:27
9
You can pass cb_kwargs
argument to scrapy.Request()
https://docs.scrapy.org/en/latest/topics/request-response.html#scrapy.http.Request.cb_kwargs
import scrapy
class MySpider(scrapy.Spider):
name = 'myspider'
def start_requests(self):
urls = [...]
for index, url in enumerate(urls):
yield scrapy.Request(url, callback=self.parse, cb_kwargs={'index':index})
def parse(self, response, index):
pass

jay padaliya
- 624
- 6
- 12