3

i have multiple spiders in one project , problem is right now i am defining LOG_FILE in SETTINGS like

LOG_FILE = "scrapy_%s.log" % datetime.now()

what i want is scrapy_SPIDERNAME_DATETIME

but i am unable to provide spidername in log_file name ..

i found

scrapy.log.start(logfile=None, loglevel=None, logstdout=None)

and called it in each spider init method but its not working ..

any help would be appreciated

akhter wahab
  • 4,045
  • 1
  • 25
  • 47
  • Why isn't it working? Provide some error messages and what you are expecting. – Qiau Aug 21 '12 at 08:50
  • 1
    @Qiau thanks for pointing out i just accepted all correct answers , there is no error so far ,but scrapy.log.start(logfile='output.log', loglevel=log.DEBUG, logstdout=None) is not creating any log file... – akhter wahab Aug 21 '12 at 09:08

3 Answers3

7

The spider's __init__() is not early enough to call log.start() by itself since the log observer is already started at this point; therefore, you need to reinitialize the logging state to trick Scrapy into (re)starting it.

In your spider class file:

from datetime import datetime
from scrapy import log
from scrapy.spider import BaseSpider

class ExampleSpider(BaseSpider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ["http://www.example.com/"]

    def __init__(self, name=None, **kwargs):
        LOG_FILE = "scrapy_%s_%s.log" % (self.name, datetime.now())
        # remove the current log
        # log.log.removeObserver(log.log.theLogPublisher.observers[0])
        # re-create the default Twisted observer which Scrapy checks
        log.log.defaultObserver = log.log.DefaultObserver()
        # start the default observer so it can be stopped
        log.log.defaultObserver.start()
        # trick Scrapy into thinking logging has not started
        log.started = False
        # start the new log file observer
        log.start(LOG_FILE)
        # continue with the normal spider init
        super(ExampleSpider, self).__init__(name, **kwargs)

    def parse(self, response):
        ...

And the output file might look like:

scrapy_example_2012-08-25 12:34:48.823896.log

Steven Almeroth
  • 7,758
  • 2
  • 50
  • 57
  • For some reason, this didn't work for me. Kept getting "ERROR" in scrapy shell, that's it. Couldn't debug properly. Anyway, this answer - http://stackoverflow.com/a/16092502/2689986 did the job. Thanks! – shad0w_wa1k3r Feb 13 '14 at 13:49
1

There should be a BOT_NAME in your settings.py. This is the project/spider name. So in your case, this would be

LOG_FILE = "scrapy_%s_%s.log" % (BOT_NAME, datetime.now())

This is pretty much the same that Scrapy does internally

But why not use log.msg. The docs clearly state that this is for spider specific stuff. It might be easier to use this and just extract/grep/... the different spider log messages from a big log file.

A more compicated approach would be to get the location of the spider SPIDER_MODULES list and load all spiders inside these package.

DrColossos
  • 12,656
  • 3
  • 46
  • 67
  • 1
    BOT_NAME in setting.py has entire project name not spider name , lets see i have one project call "mall" that have a_mall , b_mall, c_mall spiders i need that names – akhter wahab Aug 23 '12 at 08:35
1

You can use Scrapy's Storage URI parameters in your settings.py file for FEED URI.

  1. %(name)s
  2. %(time)s

    For example: /tmp/crawled/%(name)s/%(time)s.log

Umair A.
  • 6,690
  • 20
  • 83
  • 130