I have a Django 1.4 app that was saving text in mysql database with utf8 charset. Everything worked fine, but I came across a problem when I wanted to read this data using ruby where strings with emojis were throwing invalid byte sequence in utf-8
exception.
Quick search told me that I should've use utf8mb4
charset in mysql, but since these strings don't appear to be valid utf-8 at this moment, simple alter table
changing the charset is not fixing the problem.
How was Django saving these strings in the first place, making emojis work with utf8
(and not utf8mb4
) charset work?
edit
Example: tested string was a single emoji:
before save in Django -
str
type, sequence:[237, 160, 189, 237, 180, 165]
fetched from db in Django -
unicode
type, sequence[55357, 56613]
fetched from db in Rails - sequence
[237, 160, 189, 237, 180, 165]
Both Django and Rails use utf8 encoding when connecting to the database.