I'm new to Scrapy and I once managed to run my script well on Scrapy 0.24. But when I switched to the newly launched 1.0 I encountered a logging problem: What I want to do is to set both the file and the console log level to INFO, but however I set the LOG_LEVEL or the configure_logging() function(using the Python internal logging package instead of scrapy.log), Scrapy always logs DEBUG level information to the console, which returns the whole item object in format of dict. In fact, the LOG_LEVEL option only works for the external file. I suspect it must have something to do with the Python logging but have no idea how to set it. Could any one help me out?
This is how I config my logging in run_my_spider.py:
from crawler.settings import LOG_FILE, LOG_FORMAT
from scrapy.crawler import CrawlerProcess
from scrapy.utils.project import get_project_settings
from scrapy.utils.log import configure_logging
from crawler.spiders.MySpiders import MySpider
import logging
def run_spider(spider):
settings = get_project_settings()
# configure file logging
# It ONLY works for the file
configure_logging({'LOG_FORMAT': LOG_FORMAT,
'LOG_ENABLEED' : True,
'LOG_FILE' : LOG_FILE,
'LOG_LEVEL' : 'INFO',
'LOG_STDOUT' : True})
# instantiate spider
process = CrawlerProcess(settings)
process.crawl(MySpider)
logging.info('Running Crawler: ' + spider.name)
process.start() # the script will block here until the spider_closed signal was sent
logging.info('Crawler ' + spider.name + ' stopped.\n')
......
This is the console output:
DEBUG:scrapy.core.engine:Crawled (200) <GET http://mil.news.sina.com.cn/2014-10-09/0450804543.html>(referer: http://rss.sina.com.cn/rollnews/jczs/20141009.js)
{'item_name': 'item_sina_news_reply',
'news_id': u'jc:27-1-804530',
'reply_id': u'jc:27-1-804530:1',
'reply_lastcrawl': '1438605374.41',
'reply_table': 'news_reply_20141009'}
Many Thanks!