3

I seek your help in the following 2 questions - How do I set the handler for the different log levels like in python. Currently, I have

STATS_ENABLED = True
STATS_DUMP = True 

LOG_FILE = 'crawl.log'

But the debug messages generated by Scrapy are also added into the log files. Those are very long and ideally, I would like the DEBUG level messages to left on standard error and INFO messages to be dump to my LOG_FILE.

Secondly, in the docs, it says The logging service must be explicitly started through the scrapy.log.start() function. My question is, where do I run this scrapy.log.start()? Is it inside my spider?

goh
  • 27,631
  • 28
  • 89
  • 151

4 Answers4

4

Secondly, in the docs, it says The logging service must be explicitly started through the scrapy.log.start() function. My question is, where do I run this scrapy.log.start()? Is it inside my spider?

If you run a spider using scrapy crawl my_spider -- the log is started automatically if STATS_ENABLED = True

If you start the crawler process manually, you can do scrapy.log.start() before starting the crawler process.

from scrapy.crawler import CrawlerProcess
from scrapy.conf import settings


settings.overrides.update({}) # your settings

crawlerProcess = CrawlerProcess(settings)
crawlerProcess.install()
crawlerProcess.configure()

crawlerProcess.crawl(spider) # your spider here

log.start() # depends on LOG_ENABLED

print "Starting crawler."
crawlerProcess.start()
print "Crawler stopped."

The little knowledge I have about your first question:

Because you have to start the scrapy log manually, this allows you to use your own logger.

I think you can copy module scrapy/scrapy/log.py in scrapy sources, modify it, import it instead of scrapy.log and run start() - scrapy will use your log. In it there is a line in function start() which says log.startLoggingWithObserver(sflo.emit, setStdout=logstdout).

Make your own observer (http://docs.python.org/howto/logging-cookbook.html#logging-to-multiple-destinations) and use it there.

warvariuc
  • 57,116
  • 41
  • 173
  • 227
  • thanks for the answer to my second question. But do you know the answer to th first? – goh Dec 01 '11 at 00:41
  • @iws, well that is what happens when you ask several questions in one. I cannot give you the extensive answer to the first question, but i will try - see the update – warvariuc Dec 01 '11 at 06:59
3

I would like the DEBUG level messages to left on standard error and INFO messages to be dump to my LOG_FILE.

You can set LOG_LEVEL = 'INFO' in settings.py, but it will completely disable DEBUG messages.

reclosedev
  • 9,352
  • 34
  • 51
2

Hmm,

Just wanted to update that I am able to get the logging file handler to file by using

from twisted.python import log
import logging
logging.basicConfig(level=logging.INFO, filemode='w', filename='log.txt'""")
observer = log.PythonLoggingObserver()
observer.start()

however I am unable to get the log to display the spiders' name like from twisted in standard error. I posted this question.

Community
  • 1
  • 1
goh
  • 27,631
  • 28
  • 89
  • 151
0
scrapy some-scrapy's-args -L 'INFO' -s LOG_FILE=log1.log

outputs will be redirected to a logname file .

Saurabh
  • 7,525
  • 4
  • 45
  • 46