1

I get the following warnings, when trying to save a simple dataframe to mysql.:

C:...\anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning: (1366, "Incorrect string value: '\x92\xE9t\xE9)' for column 'VARIABLE_VALUE' at row 518") result = self._query(query)

And

C:...anaconda3\lib\site-packages\pymysql\cursors.py:170: Warning: (3719, "'utf8' is currently an alias for the character set UTF8MB3, but will be an alias for UTF8MB4 in a future release. Please consider using UTF8MB4 in order to be unambiguous.") result = self._query(query)

Environment info : I use Mysql8, python3.6 (pymysql 0.9.2, sqlalchemy 1.2.1)

I visited posts like the one linked bellow, none of which seem to give a solution as to how to avoid this warning.

N.B : The Collation in the table within mysql doesn't seem to be set to the one I specified in the create_db function within the Connection class.

The executable code:

import DataEngine.db.Connection as connection
import random
import pandas as pd

if __name__ == "__main__":
    conn = connection.Connection(host="host_name", port="3306", user="username", password="password")
    conn.create_db("raw_data")
    conn.establish("raw_data")
    l1 = []
    for i in range(10):
        l_nested = []
        for j in range(10):
            l_nested.append(random.randint(0, 100))
        l1.append(l_nested)
    df = pd.DataFrame(l1)

    conn.save(df, "random_df")
    df2 = conn.retrieve("random_df")
    print(df2)

So the dataframe that is persisted in the database is :

   index   0   1   2   3   4   5   6   7   8   9
0      0  11  57  75  45  81  70  91  66  93  96
1      1  51  43   3  64   2   6  93   5  49  40
2      2  35  80  76  11  23  87  19  32  13  98
3      3  82  10  69  40  34  66  42  24  82  59
4      4  49  74  39  61  14  63  94  92  82  85
5      5  50  47  90  75  48  77  17  43   5  29
6      6  70  40  78  60  29  48  52  48  39  36
7      7  21  87  41  53  95   3  31  67  50  30
8      8  72  79  73  82  20  15  51  14  38  42
9      9  68  71  11  17  48  68  17  42  83  95

My Connection class

import sqlalchemy
import pymysql
import pandas as pd


class Connection:
    def __init__(self: object, host: str, port: str, user: str, password: str):
        self.host = host
        self.port = port
        self.user = user
        self.password = password
        self.conn = None

    def create_db(self: object, db_name: str, charset: str = "utf8mb4", collate:str ="utf8mb4_unicode_ci",drop_if_exists: bool = True):
        c = pymysql.connect(host=self.host, user=self.user, password=self.password)
        if drop_if_exists:
            c.cursor().execute("DROP DATABASE IF EXISTS " + db_name)
        c.cursor().execute("CREATE DATABASE " + db_name + " CHARACTER SET=" + charset + " COLLATE=" + collate)
        c.close()
        print("Database %s created with a %s charset" % (db_name, charset))

    def establish(self: object, db_name: str, charset: str = "utf8mb4"):
        self.conn = sqlalchemy.create_engine(
            "mysql+pymysql://" + self.user + ":" + self.password + "@" + self.host + ":" + self.port + "/" + db_name +
            "?charset=" + charset)
        print("Connection with database : %s has been established as %s at %s." % (db_name, self.user, self.host))
        print("Charset : %s" % charset)

    def retrieve(self, table):
        df = pd.read_sql_table(table, self.conn)
        return df

    def save(self: object, df: "Pandas.DataFrame", table: str, if_exists: str = "replace", chunksize: int = 10000):
        df.to_sql(name=table, con=self.conn, if_exists=if_exists, chunksize=chunksize)

Some elements that might help: Database is of charset utf8mb4 Table columns don't seem to have the correct collation even though I specified it when creating the db

Imad
  • 2,358
  • 5
  • 26
  • 55

1 Answers1

1

Well, hex 92 and e9 is not valid utf8mb4 (UTF-8). Perhaps you were expecting ’été, assuming CHARACTER SETs cp1250, cp1256, cp1257, or latin1.

Find out where that text is coming from, and let's decide whether it is valid latin1. Then we can fix the code to declare that the client is really using latin1, not utf8mb4? Or we can fix the client to use UTF-8, which would probably be better in the long run.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • But there is no `éé`, they're just numbers as shown in the main. – Imad Aug 21 '18 at 05:54
  • @Aetos - Is it in French, and talking about "summer"? – Rick James Aug 21 '18 at 06:28
  • There are no characters at all in the dataset. If the data isn't in the dataset where else would it come from? – Imad Aug 21 '18 at 07:52
  • @Aetos - "column 'VARIABLE_VALUE' at row 518" -- hmmmm... Was this a fresh install of MySQL? Or an Upgrade? Please provide `SHOW VARIABLES;` and `SHOW GLOBAL STATUS;` I'm wondering what is at line 518 of those. – Rick James Aug 21 '18 at 18:29
  • This was a fresh install. I did a `mysql -u"UserName" -p -h "HOST" -A -e"SHOW GLOBAL STATUS;" > MySQLCurrentStatus.txt`, I got an output with 429 lines. As for `SHOW GLOBAL STATUS;` The line 518 contains `sync_relay_log_info 10000`. – Imad Aug 22 '18 at 07:37
  • As I understand from other posts with similar encoding issues (if that's the problem) `VARIABLE_VALUE` is a column name. Except I don't have any columns named that way, so perhaps it's a place holder or default column name related to sqlalchemy? See link : [bitbucket issue 4085](https://bitbucket.org/zzzeek/sqlalchemy/issues/4085/warning-1366-incorrect-string-value) Please keep in mind my actual data contains just numbers, not characters or special characters. – Imad Aug 22 '18 at 07:42
  • @Aetos - The last comment on the bigbucket page refers to "stdin". Do you know how that input was generated? I can't think of a case where exactly `VARIABLE_VALUE` is used (I was mistakenly thinking that `SHOW` used that phrasing.) – Rick James Aug 22 '18 at 15:33