Yes, again a question about unicode and Python. I thought I've read it all and adopted good programming practice after you guys opened my mind about Unicode,but this error came back at me :
'ascii' codec can't encode character u'\xc0' in position 24: ordinal not in range(128)
I'm using scrapy with Python 2.7, so whats coming from "outside world" are web page properly decoded and processed with xpath as can be seen by an empty error log coming from those checks just before the exception:
if not isinstance(key, unicode):
logging.error(u"Key is not unicode: %s" % key)
if not isinstance(value, unicode):
logging.error(u"value is not unicode: %s" % value)
if not isinstance(item['listingId'], int):
logging.error(u"item['listingId'] is not an int:%s" % item['listingId'])
But then, when the MySql transaction is hapening on the next line:
d = txn.execute("INSERT IGNORE INTO `listingsDetails` VALUE (%s, %s, %s);", (item['listingId'], key, value))
I still get this exception from time to time. (1% of pages) Notice the "pipeline.py" line 403 which is the MySql INSERT.
2016-10-16 22:22:10 [Listings] ERROR: [Failure instance: Traceback: <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\xc0' in position 24: ordinal not in range(128)
/usr/lib/python2.7/threading.py:801:__bootstrap_inner
/usr/lib/python2.7/threading.py:754:run
/usr/lib/python2.7/dist-packages/twisted/_threads/_threadworker.py:46:work
/usr/lib/python2.7/dist-packages/twisted/_threads/_team.py:190:doWork
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py:246:inContext
/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py:262:<lambda>
/usr/lib/python2.7/dist-packages/twisted/python/context.py:118:callWithContext
/usr/lib/python2.7/dist-packages/twisted/python/context.py:81:callWithContext
/usr/lib/python2.7/dist-packages/twisted/enterprise/adbapi.py:445:_runInteraction
/home/mrme/git/rep/scrapy_prj/firstproject/firstproject/pipelines.py:403:_do_upsert
/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py:228:execute
/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py:127:_warning_check
/usr/lib/python2.7/logging/__init__.py:1724:_showwarning
/usr/lib/python2.7/warnings.py:50:formatwarning
I've opened the MySql connection with:
dbargs = dict(
host=settings['MYSQL_HOST'],
db=settings['MYSQL_DBNAME'],
user=settings['MYSQL_USER'],
passwd=settings['MYSQL_PASSWD'],
charset='utf8',
use_unicode=True
)
dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)
I also tried to add this after connection as per MySql documentation:
self.dbpool.runOperation("SET NAMES 'utf8'", )
self.dbpool.runOperation("SET CHARSET 'utf8'",)
And confirmed my database is correctly setup with:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name;
character_set_client: utf8mb4
character_set_connection: utf8mb4
character_set_database: utf8
character_set_filesystem: binary
character_set_results: utf8mb4
character_set_server: latin1
character_set_system: utf8
Who the hell is trying to encode in Ascii here?
Every 2cents is welcomed ;-)
If its of any help, all other accent character are fine in the database. Only this u'\xc0' = À is problematic.