0

I have a Django 1.4 app that was saving text in mysql database with utf8 charset. Everything worked fine, but I came across a problem when I wanted to read this data using ruby where strings with emojis were throwing invalid byte sequence in utf-8 exception.

Quick search told me that I should've use utf8mb4 charset in mysql, but since these strings don't appear to be valid utf-8 at this moment, simple alter table changing the charset is not fixing the problem.

How was Django saving these strings in the first place, making emojis work with utf8 (and not utf8mb4) charset work?

edit

Example: tested string was a single emoji:

  • before save in Django - str type, sequence: [237, 160, 189, 237, 180, 165]

  • fetched from db in Django - unicode type, sequence [55357, 56613]

  • fetched from db in Rails - sequence [237, 160, 189, 237, 180, 165]

Both Django and Rails use utf8 encoding when connecting to the database.

majkel
  • 66
  • 4
  • 1
    It would be helpful to tell us what the expected character is, what the actual stored byte sequence is, and whether django can read that data from the DB and if so, what it looks like as a python string or bytes. It would also help to show the relevant django DB config for the MySQL connection – Tom Dalton Mar 05 '18 at 16:19
  • Thanks @TomDalton, I checked and it looks interesting - ruby receives the sequence of `str` type from the database. Is there a way to convert it to unicode? – majkel Mar 06 '18 at 07:25
  • That seems to be utf16-encoded. Avoid such. – Rick James Mar 07 '18 at 02:54

0 Answers0