why scrapy logs different in the console and external log file

Question

I'm new to Scrapy and I once managed to run my script well on Scrapy 0.24. But when I switched to the newly launched 1.0 I encountered a logging problem: What I want to do is to set both the file and the console log level to INFO, but however I set the LOG_LEVEL or the configure_logging() function(using the Python internal logging package instead of scrapy.log), Scrapy always logs DEBUG level information to the console, which returns the whole item object in format of dict. In fact, the LOG_LEVEL option only works for the external file. I suspect it must have something to do with the Python logging but have no idea how to set it. Could any one help me out?

This is how I config my logging in run_my_spider.py:

from crawler.settings import LOG_FILE, LOG_FORMAT
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.utils.log import configure_logging
from crawler.spiders.MySpiders import MySpider
import logging


def run_spider(spider):
    settings = get_project_settings()

    # configure file logging
    # It ONLY works for the file
    configure_logging({'LOG_FORMAT': LOG_FORMAT,
                'LOG_ENABLEED' : True,
                'LOG_FILE' : LOG_FILE, 
                'LOG_LEVEL' : 'INFO',
                'LOG_STDOUT' : True})

    # instantiate spider
    process = CrawlerProcess(settings)
    process.crawl(MySpider)
    logging.info('Running Crawler: ' + spider.name)
    process.start() # the script will block here until the spider_closed signal was sent
    logging.info('Crawler ' + spider.name + ' stopped.\n')

......

This is the console output:

DEBUG:scrapy.core.engine:Crawled (200) <GET http://mil.news.sina.com.cn/2014-10-09/0450804543.html>(referer: http://rss.sina.com.cn/rollnews/jczs/20141009.js)
 {'item_name': 'item_sina_news_reply',
 'news_id': u'jc:27-1-804530',
 'reply_id': u'jc:27-1-804530:1',
 'reply_lastcrawl': '1438605374.41',
 'reply_table': 'news_reply_20141009'}

Many Thanks!

score 1 · Accepted Answer · edited May 23 '17 at 11:51

1

It may be that what you are viewing in the console is the Twisted Logs. It will print the Debug level messages to the console. You can redirect them to your log files using:

from twisted.python import log
observer = log.PythonLoggingObserver(loggerName='logname')
observer.start()

(As given in How to make Twisted use Python logging?)

edited May 23 '17 at 11:51

Community

1
1

answered Aug 26 '15 at 11:16

javaEd

158
2
10

why scrapy logs different in the console and external log file

1 Answers1