20

I've decided to use the Python logging module because the messages generated by Twisted on std error is too long, and I want to INFO level meaningful messages such as those generated by the StatsCollector to be written on a separate log file while maintaining the on screen messages.

 from twisted.python import log
     import logging
     logging.basicConfig(level=logging.INFO, filemode='w', filename='buyerlog.txt')
     observer = log.PythonLoggingObserver()
     observer.start()

Well, this is fine, I've got my messages, but the downside is that I do not know the messages are generated by which spider! This is my log file, with "twisted" being displayed by %(name)s:

 INFO:twisted:Log opened.
  2 INFO:twisted:Scrapy 0.12.0.2543 started (bot: property)
  3 INFO:twisted:scrapy.telnet.TelnetConsole starting on 6023
  4 INFO:twisted:scrapy.webservice.WebService starting on 6080
  5 INFO:twisted:Spider opened
  6 INFO:twisted:Spider opened
  7 INFO:twisted:Received SIGINT, shutting down gracefully. Send again to force unclean shutdown
  8 INFO:twisted:Closing spider (shutdown)
  9 INFO:twisted:Closing spider (shutdown)
 10 INFO:twisted:Dumping spider stats:
 11 {'downloader/exception_count': 3,
 12  'downloader/exception_type_count/scrapy.exceptions.IgnoreRequest': 3,
 13  'downloader/request_bytes': 9973,

As compared to the messages generated from twisted on standard error:

2011-12-16 17:34:56+0800 [expats] DEBUG: number of rules: 4
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Telnet console listening on 0.0.0.0:6023
2011-12-16 17:34:56+0800 [scrapy] DEBUG: Web service listening on 0.0.0.0:6080
2011-12-16 17:34:56+0800 [iproperty] INFO: Spider opened
2011-12-16 17:34:56+0800 [iproperty] DEBUG: Redirecting (301) to <GET http://www.iproperty.com.sg/> from <GET http://iproperty.com.sg>
2011-12-16 17:34:57+0800 [iproperty] DEBUG: Crawled (200) <

I've tried %(name)s, %(module)s amongst others but I don't seem to be able to show the spider name. Does anyone knows the answer?

EDIT: the problem with using LOG_FILE and LOG_LEVEL in settings is that the lower level messages will not be shown on std error.

Acorn
  • 49,061
  • 27
  • 133
  • 172
goh
  • 27,631
  • 28
  • 89
  • 151

8 Answers8

25

You want to use the ScrapyFileLogObserver.

import logging
from scrapy.log import ScrapyFileLogObserver

logfile = open('testlog.log', 'w')
log_observer = ScrapyFileLogObserver(logfile, level=logging.DEBUG)
log_observer.start()

I'm glad you asked this question, I've been wanting to do this myself.

Acorn
  • 49,061
  • 27
  • 133
  • 172
  • after adding these lines in my settings.py, scrapy is unable to find my spiders. (command line) – goh Dec 16 '11 at 10:26
  • Hmm, I put it in my spider module and it worked fine.. let me experiment. **Edit:** how about putting it in the `__init__` file of your spiders module? That seems to do the job. – Acorn Dec 16 '11 at 10:28
  • Hmm, putting it in the spiders work. Funny why it doesn't work in settings.py. Also, I couldn't find this ScrapyFileObserver anywhere in the docs. Perhaps you could direct me to the link (other than github)? – goh Dec 16 '11 at 10:45
  • 2
    It doesn't seem to be a documented feature. Had to take a peek at the source for `scrapy.log` to find it. – Acorn Dec 16 '11 at 10:49
  • `settings.py` is probably called before the Twisted reactor has been started. – Acorn Dec 16 '11 at 11:15
  • 7
    as of 2017, this module has been removed and it is now deprecated: "Module `scrapy.log` has been deprecated, Scrapy now relies on the builtin Python library for logging. Read the updated logging entry in the documentation to learn more." – appoll Mar 29 '17 at 08:14
18

It is very easy to redirect output using: scrapy some-scrapy's-args 2>&1 | tee -a logname

This way, all what scrapy ouputs into stdout and stderr, will be redirected to a logname file and also, prited to the screen.

Alexander Artemenko
  • 21,378
  • 8
  • 39
  • 36
  • Worked perfectly! It is ideal for development, when we are simply experimenting with scrapers and log is too long to keep in terminal, yet we don't want to code whole python logging in spiders just yet. – Kulbi Oct 24 '16 at 23:27
  • Finally a solution for me. Unfortunately, it saved the log with some encode errors, I guess. ^[[0;0;34m2019-05-07 17:09:34^[[0;0m ^[[0;0;36m[scrapy.extensions.telnet]^[[0;0m ^[[0;0;31mINFO^[[0;0m: – sergiomafra May 07 '19 at 20:12
9

For all those folks who came here before reading the current documentation version:

import logging
from scrapy.utils.log import configure_logging

configure_logging(install_root_handler=False)
logging.basicConfig(
    filename='log.txt',
    filemode = 'a',
    format='%(levelname)s: %(message)s',
    level=logging.DEBUG
)
Alex K.
  • 835
  • 6
  • 15
5

I know this is old but it was a really helpful post since the class still isn't properly documented in the Scrapy docs. Also, we can skip importing logging and use scrapy logs directly. Thanks All!

from scrapy import log

logfile = open('testlog.log', 'a')
log_observer = log.ScrapyFileLogObserver(logfile, level=log.DEBUG)
log_observer.start()
IamnotBatman
  • 342
  • 3
  • 7
5

As the Scrapy Official Doc said:

Scrapy uses Python’s builtin logging system for event logging.

So you can config your logger just as a normal Python script.

First, you have to import the logging module:

import logging

You can add this line to your spider:

logging.getLogger().addHandler(logging.StreamHandler())

It adds a stream handler to log to console.

After that, you have to config logging file path.

Add a dict named custom_settings which consists of your spider-specified settings:

custom_settings = {
     'LOG_FILE': 'my_log.log',
     'LOG_LEVEL': 'INFO',
     ... # you can add more settings
 }

The whole class looks like:

import logging

class AbcSpider(scrapy.Spider):
    name: str = 'abc_spider'
    start_urls = ['you_url']
    custom_settings = {
         'LOG_FILE': 'my_log.log',
         'LOG_LEVEL': 'INFO',
         ... # you can add more settings
     }
     logging.getLogger().addHandler(logging.StreamHandler())

     def parse(self, response):
        pass
rassar
  • 5,412
  • 3
  • 25
  • 41
Shi XiuFeng
  • 715
  • 5
  • 13
  • It seems you haven't added the custom settings to the logger. Is that correct? – sergiomafra May 07 '19 at 20:19
  • This helped me, but the first two bits are all that's needed: import logging, and add the handler. You don't need `custom_settings` here necessarily. I put the import and the handler in `settings.py` rather than the spider. – scharfmn Mar 08 '21 at 16:51
3

ScrapyFileLogObserver is no longer supported. You may use standard python logging module.

import logging
logging.getLogger().addHandler(logging.StreamHandler())
1

As of Scrapy 2.3, none of the answers mentioned above worked for me. In addition, the solution found in the documentation caused overwriting of the log file with every message, which is of course not what you want in a log. I couldn't find a built-in setting that changed the mode to "a" (append). I achieved logging to both file and stdout with the following configuration code:

configure_logging(settings={
    "LOG_STDOUT": True
})
file_handler = logging.FileHandler(filename, mode="a")
formatter = logging.Formatter(
    fmt="%(asctime)s,%(msecs)d %(name)s %(levelname)s %(message)s",
    datefmt="%H:%M:%S"
)
file_handler.setFormatter(formatter)
file_handler.setLevel("DEBUG")
logging.root.addHandler(file_handler) 
Royar
  • 611
  • 6
  • 21
  • That was an amazing solution. I had have some hard time to create a log file together with print(). Thank you so much. – DevScheffer Jan 28 '22 at 13:14
0

Another way is to disable Scrapy's log setting and use custom setting file.

settings.py

import logging
import yaml

LOG_ENABLED = False
logging.config.dictConfig(yaml.load(open("logging.yml").read(), Loader=yaml.SafeLoader))

logging.yml

version: 1
formatters:
  simple:
    format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
handlers:
  console:
    class: logging.StreamHandler
    level: INFO
    formatter: simple
    stream: ext://sys.stdout
  file:
    class : logging.FileHandler
    level: INFO
    formatter: simple
    filename: scrapy.log
root:
  level: INFO
  handlers: [console, file]
disable_existing_loggers: False

example_spider.py

import scrapy

class ExampleSpider(scrapy.Spider):
    name = "example"
    allowed_domains = ["example.com"]
    start_urls = ["http://example.com/"]

    def parse(self, response):
        self.logger.info("test")
        pass
stonewell
  • 1
  • 1