2

I'm working on a small pet-project that involves some accounting in multiple currencies. During its development I decided to move from straight-forward DB setting to DB-migrations using alembic. And on some migrations I need to populate DB with initial currencies, that are displayed in Ukrainian.

My problem is that data populated from alembic migration scripts is saving in some unknown encoding, so I cannot use it within the application (that expects to be human readable). My settings as well as script are as follows:

alembic.ini

...
sqlalchemy.url = mysql+pymysql://defaultuser:defaultpwd@localhost/petdb
...

alembic/versions/f433ab2a814_adding_currency.py

from alembic import op

# -*- coding: utf-8 -*-
"""Adding currency

Revision ID: f433ab2a814
Revises: 49538bba2220
Create Date: 2016-03-08 13:50:35.369021

"""

# revision identifiers, used by Alembic.
revision = 'f433ab2a814'
down_revision = '1c0b47263c82'
branch_labels = None
depends_on = None

def upgrade():
  op.create_table(
              'currency',
              Column('id', Integer, primary_key=True),
              Column('name', Unicode(120), nullable=False),
              Column('abbr', String(3), nullable=False)
          )
  op.execute(u'INSERT INTO currency SET name="{}", abbr="{}";'.format(u"Гривня", "UAH"))

After checking table currency from mysql client or mysql-workbench, it is displayed as:

mysql> SELECT * FROM currency;
+----+----------------------------+------+
| id | name                       | abbr |
+----+----------------------------+------+
|  1 | Ð“Ñ€Ð¸Ð²Ð½Ñ                | UAH  |
+----+----------------------------+------+

Expected result is:

mysql> SELECT * FROM currency;
+----+----------------------------+------+
| id | name                       | abbr |
+----+----------------------------+------+
|  1 | Гривня                     | UAH  |
+----+----------------------------+------+

From my application I've been setting this value as follows:

from petproject import app

app.config.from_object(config.DevelopmentConfig)
engine = create_engine(app.config["DATABASE"]+"?charset=utf8",
                       convert_unicode=True, encoding="utf8", echo=False)
db_session = scoped_session(sessionmaker(autocommit=False,
                                         autoflush=False,
                                         bind=engine))

if len(db_session.query(Currency).all()) == 0:
  default_currency = Currency()
  default_currency.name = u"Гривня"
  default_currency.abbr = u"UAH"
  db_session.add(default_currency)
  db_session.commit()

So I'm wondering how to insert initial Unicode values on migration that will be stored in correct encoding. Did I miss anything?

wanderlust
  • 1,826
  • 1
  • 21
  • 25
  • Did you set the [collation](https://dev.mysql.com/doc/refman/5.7/en/charset-syntax.html) (on the server, database, table, or column) to a UTF-8 collation? – univerio Mar 14 '16 at 17:58
  • Yes. mysql> SHOW CREATE TABLE currency; | Table | Create Table +----------+---------------------------------------------------------+ | currency | CREATE TABLE `currency` ( `id` int(11) NOT NULL AUTO_INCREMENT, `name` varchar(120) COLLATE utf8_unicode_ci NOT NULL, `abbr` varchar(3) COLLATE utf8_unicode_ci NOT NULL, PRIMARY KEY (`id`) ) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci | – wanderlust Mar 14 '16 at 18:37

1 Answers1

1

After a more extended analysis, I discovered, that MySQL keeps all data in 'windows-1252' encoding. MySQL manual (section "West European Character Sets") states about this issue as:

latin1 is the default character set. MySQL's latin1 is the same as the Windows cp1252 character set.

It looked like either MySQL ignored character_set_client that, I assumed to be 'utf-8', or SQLAlchemy / alembic didn't inform server to accept data as 'UTF-8' encoded data. Unfortunatelly, recommended option '?charset=utf8' is not possible to set in alembic.ini.

In order to accept and save data in correct encoding, I set character set manually by calling op.execute('SET NAMES utf8');. Thus complete code looks like:

def upgrade():
  op.create_table(
              'currency',
              Column('id', Integer, primary_key=True),
              Column('name', Unicode(120), nullable=False),
              Column('abbr', String(3), nullable=False)
          )
  op.execute('SET NAMES utf8')
  op.execute(u'INSERT INTO currency SET name="{}", abbr="{}";'.format(u"Гривня", "UAH"))

And result became as expected:

mysql> SELECT * FROM currency;
+----+----------------------------+------+
| id | name                       | abbr |
+----+----------------------------+------+
|  1 | Гривня                     | UAH  |
+----+----------------------------+------+
wanderlust
  • 1,826
  • 1
  • 21
  • 25