2

I need to access the custom settings passed from the CLI using:

-s SETTING_NAME="SETTING_VAL" from the __init__() method of the spider class.

get_project_settings() allows me to access only the static settings.

The docs explain how you can access those custom settings by from a pipeline setting up a new pipeline through:

@classmethod
def from_crawler(cls, crawler):
    settings = crawler.settings

But is there any way to access them from the __init__() spider method?

Nikolay Shindarov
  • 1,616
  • 2
  • 18
  • 25

3 Answers3

5

Just use settings.get e.g.

print(self.settings.get('SETTING_NAME'))

will print

SETTING_VAL

If you want to access a setting in your spider __init__ you have a couple of options. If you command-line options is just a spider argument, use -a instead of -s. If for some reason you need to access an actual setting in your spider __init__ then you have to override the from_crawler classmethod as described in the docs.

Here is an example:

import scrapy

class ArgsSpider(scrapy.Spider):
    name = "my_spider"

    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        print('kwargs =', kwargs)

    @classmethod
    def from_crawler(cls, crawler, *args, **kwargs):
        spider = cls(
            *args,
            my_setting=crawler.settings.get("MY_SETTING"),
            **kwargs
        )
        spider._set_crawler(crawler)
        return spider

run with e.g. scrapy runspider args_spider.py -s MY_SETTING=hello,world! and you will see your setting in the kwargs dict. You can of course get other settings this way too

tomjn
  • 5,100
  • 1
  • 9
  • 24
  • An instance of ```BaseSettings``` gives me empty values. An instance of ```Settings``` still contains only the project static settings and not the ones passed through the CLI. – Nikolay Shindarov Oct 07 '19 at 11:20
  • @NikolayShindarov where are you trying to access the settings? Please post what you are trying your question. – tomjn Oct 07 '19 at 11:46
  • @tomjin Inside the ```Spider``` inherited class. Sorry, thought it was clear enough, changed the title now. – Nikolay Shindarov Oct 07 '19 at 12:01
  • @NikolayShindarov that is clear. Can you add your spider code? Are you accessing them in the `__init__` function? The answer here works fine for getting settings elsewhere in the spider. – tomjn Oct 07 '19 at 12:24
  • It does not work, I tried it also inside ```parse()```. Please, note the details of my question: I need the custom settings passed via the CLI using the ```-s``` parameter. The code from your answer loads only the static settings, regardless of where it is called - inside ```__init__()``` or in another method. – Nikolay Shindarov Oct 07 '19 at 12:54
  • @NikolayShindarov it does work I’ve tested it. Please post the exact line you are using. – tomjn Oct 07 '19 at 13:45
  • CLI: ```scrapy crawl -a somearg="someval" -s my_arg="X"``` Code: ```def parse(self, response): from scrapy.settings import Settings, BaseSettings sett1 = Settings() sett2 = BaseSettings() __import__('pprint').pprint(dict(sett1)) print(sett1.get('my_arg')) __import__('pprint').pprint(dict(sett2)) print(sett2.get('my_arg'))``` Both of them give me ```None``` – Nikolay Shindarov Oct 07 '19 at 14:11
  • @NikolayShindarov you don’t need to do any of that. Just do `def parse(self, response): print(self.settings.get('my_arg'))`. Try that and let me know. – tomjn Oct 07 '19 at 14:14
  • That indeed works, thanks. However, my issue is still not resolved at it seems it does not work inside ```__init__()``` or inside a method called by ```__init__()```. I upvoted your answer and modified the specifications of my questions, the issue remains unresolved. – Nikolay Shindarov Oct 07 '19 at 14:24
  • @NikolayShindarov I've updated my answer to also answer your new question – tomjn Oct 08 '19 at 11:00
  • Thanks, I managed to do it meanwhile exactly using the from_crawler factory method. I accepted your answer and added mine as I needed some more tweaks in my situation to get it working. – Nikolay Shindarov Oct 08 '19 at 12:16
1

None of the responses work when using CrawlSpider, because some methods such as _follow_links will be missing.

The following code works:

@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
    spider = super().from_crawler(
            crawler,
            *args,
            some_arg=crawler.settings.get("SOME_ARG"), **kwargs
    )
    return spider
Tangui
  • 3,626
  • 2
  • 26
  • 28
0

The answer of @tomjn works, I just want to point out that if you are using extensions or middlewares which need the crawler object, you need to alter his factory method this way:

@classmethod
def from_crawler(cls, crawler, *args, **kwargs):
    obj = cls(
        *args,
        my_setting=crawler.settings.get("MY_SETTING"),
        **kwargs
    )
    obj.crawler = crawler
    return obj

Not sure if you need to do it if you instantiate the spider itself, but in my case, when I was inheriting it, my script would crash complaining about missing crawler attribute until I did it this way.

Nikolay Shindarov
  • 1,616
  • 2
  • 18
  • 25