5

I am trying to write a Pandas dataframe into MySQL database, and have used the following code:

engine = sqlalchemy.create_engine("mysql+pymysql://root:password@localhost/skills?charset=utf8mb4")
connection = engine.connect
dataframe.head().to_sql('indeed_resumes', engine, flavor='mysql', if_exists='replace',index=True)

However, I get the following error:

InternalError: (1366, "Incorrect string value: '\\xE1\\xBB\\x99i\\x0AO...' for column 'work' at row 5")

The datatype of the MySQL table is as follows:

 mysql> desc indeed_resumes;
    +-----------+------------+------+-----+---------+-------+
    | Field     | Type       | Null | Key | Default | Extra |
    +-----------+------------+------+-----+---------+-------+
    | index     | bigint(20) | YES  | MUL | NULL    |       |
    | certs     | text       | YES  |     | NULL    |       |
    | contact   | text       | YES  |     | NULL    |       |
    | education | text       | YES  |     | NULL    |       |
    | headline  | text       | YES  |     | NULL    |       |
    | info      | text       | YES  |     | NULL    |       |
    | skills    | text       | YES  |     | NULL    |       |
    | summary   | text       | YES  |     | NULL    |       |
    | updated   | text       | YES  |     | NULL    |       |
    | work      | text       | YES  |     | NULL    |       |
    +-----------+------------+------+-----+---------+-------+
    10 rows in set (0.00 sec)

My data consists of very long strings (sometimes around 3000 characters), so this could be causing the error. Any suggestions?

runawaykid
  • 1,391
  • 1
  • 15
  • 20

1 Answers1

11

I seem to have resolved this issue. It looks like I also needed to change the database encoding using the following commands.

ALTER DATABASE skills CHARACTER SET = utf8mb4 COLLATE = utf8mb4_unicode_ci;
ALTER TABLE indeed_resumes CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci;

From https://mathiasbynens.be/notes/mysql-utf8mb4:

"Turns out MySQL’s utf8 charset only partially implements proper UTF-8 encoding. It can only store UTF-8-encoded symbols that consist of one to three bytes; encoded symbols that take up four bytes aren’t supported.

Luckily, MySQL 5.5.3 (released in early 2010) introduced a new encoding called utf8mb4 which maps to proper UTF-8 and thus fully supports Unicode, including astral symbols."

runawaykid
  • 1,391
  • 1
  • 15
  • 20