0

I write my scrapy spider that start with two class variable, and then want to run it from Runner. I did try:

yield runner.crawl(MySpider1, variable1, variable2)

or

yield runner.crawl(MySpider1, [variable1, variable2])

or

yield runner.crawl(MySpider1, (variable1, variable2))

or

yield runner.crawl(MySpider1(variable1, variable2))

but got

missing 1 required postional argument

here is my code:

from twisted.internet import reactor, defer
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging

class MySpider(scrapy.Spider):

    def _init__(self, variable1, variable2, *args, **kwargs):
        super().__init__(*arg, **kwargs)
        self.variable1 = variable1
        self.variable2 = variable2

    # below should be any normal spider's parser





class Run_Spider_From_SubClass(SpiderEmail):

    def __init__(self, *args, **kwargs):
        super().__init__(self, *args, **kwargs)

        configure_logging()
        self.runner = CrawlerRunner(get_project_settings())

    @defer.inlineCallbacks
    def crawl(self):
        for variable1, variable2 in mydict.item():
            yield self.runner.crawl(MySpider, variable1, varialbe2)  # input issue that result in missing 1 positional argument
        reactor.stop()

    def run_spider_in_loop(self):
        self.crawl()
        reactor.run()

runner = Run_Spider_From_SubClass()
runner.run_spider_in_loop()

what should be correct way to input spider's variable in Runnner? Thanks

Kai
  • 77
  • 1
  • 13

1 Answers1

0

You don't need to override __init__ in MySpider (and notice that you have a missing _ there anyway).

To pass parameters, you need to use named/keyword arguments, not positional arguments:

yield self.runner.crawl(MySpider, variable1=variable1, variable2=variable2)

This will automatically make it available in your spider as a class variable: self.variable1.

malberts
  • 2,488
  • 1
  • 11
  • 16