I have a migration that is running some custom code that depends on unicode characters. I am currently using SQLAlchemy 1.1.9 and Alembic 1.0.2.
I can see my database and table have all the right settings:
mysql> SELECT @@character_set_database, @@collation_database;
+--------------------------+----------------------+
| @@character_set_database | @@collation_database |
+--------------------------+----------------------+
| utf8mb4 | utf8mb4_general_ci |
+--------------------------+----------------------+
and
mysql> SHOW TABLE STATUS where name like 'mytable';
+---------+-----+--------------------+----------+----------------+---------+
| Name | ... | Collation | Checksum | Create_options | Comment |
+---------+-----+--------------------+----------+----------------+---------+
| mytable | ... | utf8mb4_unicode_ci | NULL | | |
+---------+-----+--------------------+----------+----------------+---------+
I have inserted a string, Nguyễn Johñ
(note that the e and n are both unicode characters). When I have my flask application load the row, it properly loads. But when I run the migration, I see alembic debug logs showing Nguy?n Johñ
and my own debug logs printing the same thing.
Why are some unicode characters converted to a question mark? (Note testing other characters, I see some characters in the terminal, some escaped, such as "\xa0"
, and others as "?"
.
The following might be significant too.
- The URL sent to
engine = create_engine()
has the utf8 charset - I have the following code for running the migration:
from sqlalchemy.sql import table, column
from sqlalchemy import String, Integer, Boolean, Date, Unicode
MyTable = table('mytable',
column('id', Integer),
column('test1', Unicode(collation='utf8mb4_unicode_ci')),
column('test2', Unicode),
)
...
def upgrade():
...
bind = op.get_bind()
session = orm.Session(bind=bind)
rows = session.query(MyTable).all()
print(rows)
- The debug logs also show the following, but I am not sure if this is just alembic's own feature detection code:
INFO [sqlalchemy.engine.base.Engine] show collation where `Charset` = 'utf8' and `Collation` = 'utf8_bin'
INFO [sqlalchemy.engine.base.Engine] ()
DEBUG [sqlalchemy.engine.base.Engine] Col ('Collation', 'Charset', 'Id', 'Default', 'Compiled', 'Sortlen')
DEBUG [sqlalchemy.engine.base.Engine] Row ('utf8_bin', 'utf8', 83, '', 'Yes', 1)