0

Yes, again a question about unicode and Python. I thought I've read it all and adopted good programming practice after you guys opened my mind about Unicode,but this error came back at me :

'ascii' codec can't encode character u'\xc0' in position 24: ordinal not in range(128)

I'm using scrapy with Python 2.7, so whats coming from "outside world" are web page properly decoded and processed with xpath as can be seen by an empty error log coming from those checks just before the exception:

if not isinstance(key, unicode):
    logging.error(u"Key is not unicode: %s" % key)

if not isinstance(value, unicode):
    logging.error(u"value is not unicode: %s" % value)

if not isinstance(item['listingId'], int):
    logging.error(u"item['listingId'] is not an int:%s" % item['listingId'])

But then, when the MySql transaction is hapening on the next line:

d = txn.execute("INSERT IGNORE INTO `listingsDetails` VALUE (%s, %s, %s);", (item['listingId'], key, value))

I still get this exception from time to time. (1% of pages) Notice the "pipeline.py" line 403 which is the MySql INSERT.

2016-10-16 22:22:10 [Listings] ERROR: [Failure instance: Traceback:     <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\xc0' in position 24: ordinal not in range(128)
/usr/lib/python2.7/threading.py:801:__bootstrap_inner
/usr/lib/python2.7/threading.py:754:run
/usr/lib/python2.7/dist-packages/twisted/_threads/_threadworker.py:46:work
/usr/lib/python2.7/dist-packages/twisted/_threads/_team.py:190:doWork
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py:246:inContext
/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py:262:<lambda>
/usr/lib/python2.7/dist-packages/twisted/python/context.py:118:callWithContext
/usr/lib/python2.7/dist-packages/twisted/python/context.py:81:callWithContext
/usr/lib/python2.7/dist-packages/twisted/enterprise/adbapi.py:445:_runInteraction
/home/mrme/git/rep/scrapy_prj/firstproject/firstproject/pipelines.py:403:_do_upsert
/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py:228:execute
/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py:127:_warning_check
/usr/lib/python2.7/logging/__init__.py:1724:_showwarning
/usr/lib/python2.7/warnings.py:50:formatwarning

I've opened the MySql connection with:

dbargs = dict(
    host=settings['MYSQL_HOST'],
    db=settings['MYSQL_DBNAME'],
    user=settings['MYSQL_USER'],
    passwd=settings['MYSQL_PASSWD'],
    charset='utf8',
    use_unicode=True
)
dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)

I also tried to add this after connection as per MySql documentation:

self.dbpool.runOperation("SET NAMES 'utf8'", )
self.dbpool.runOperation("SET CHARSET 'utf8'",)

And confirmed my database is correctly setup with:

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name;

character_set_client:     utf8mb4
character_set_connection: utf8mb4
character_set_database:   utf8
character_set_filesystem: binary
character_set_results:    utf8mb4
character_set_server:     latin1
character_set_system:     utf8

Who the hell is trying to encode in Ascii here?

Every 2cents is welcomed ;-)

If its of any help, all other accent character are fine in the database. Only this u'\xc0' = À is problematic.

Malego
  • 1
  • Are you using `from __future__ import unicode_literals` ? Maybe it's because the query string isn't unicode. – RemcoGerlich Oct 17 '16 at 10:06
  • Is the exception on the execute or on the warning logging? Can you paste the whole stacktrace with the error line? – paul trmbrth Oct 17 '16 at 10:15
  • @paul trmbrth You are right, it was the logging of the warning about a `DUPLICATE ENTRY` from the `INSERT IGNORE`. Sorry I saw ur comment after I posted my findings as an answer. If you post that as your answer, I'll accept it. Do you know why it tries to convert back to ASCII for logging the warning? – Malego Oct 19 '16 at 13:56
  • Hm, sorry no, I don't know why. – paul trmbrth Oct 19 '16 at 14:19

3 Answers3

0

you can put following code if exception come.

varName = ''.join([i if ord(i) < 128 else ' ' for i in strName])

here, strName is string which contains Non ascii value

Piyush
  • 511
  • 4
  • 13
  • Thx for the workaround, but i need those characters and it looks like this would only replace the problematic character by a space. I need to know whos dropping the ball between python and mysqldb. – Malego Oct 17 '16 at 06:21
  • I think Python because you can store Insert Query in variable and pass this variable to execute method. – Piyush Oct 17 '16 at 06:39
0

Assuming dbpool is your connection variable, try the following (overkill, but see if it works):

dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)
dbpool.set_character_set('utf8mb4')
dbpool.runOperation('SET NAMES utf8mb4;')
dbpool.runOperation('SET CHARACTER SET utf8mb4;')
dbpool.runOperation('SET character_set_connection=utf8mb4;')

If that doesn't help, see below.


Sidenote

If you run:

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

You should get the following rows in addition to those in the OP:

| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |

If not, have a quick read of my answer to the question Manipulating utf8mb4 data from MySQL with PHP, and edit your MySQL configuration.

Community
  • 1
  • 1
Eugene
  • 1,539
  • 12
  • 20
  • Tried it and got a `no function named set_character_set` from dbpool object. Removed it and got same result. I finally found my answer and posted it.... – Malego Oct 17 '16 at 19:51
0

Thx everyone for your great inputs but I finally figured out where the exception come from.

Python try to convert the DUPLICATE WARNING back from Mysql to ASCII for logging purpose and exception is raised trying to output back this warning (containing unicode) to stdout.

Replacing INSERT IGNORE by an INSERT... ON DUPLICATE UPDATE and this Exception is gone forever.

Anyone can help me find why can't python print back this warning in utf8 to stdout like everything else on my ubuntu machine?

Malego
  • 1