Running tried this with both Python3.7 and Python3.8, with mysql-connector-python 8.0.13 and 8.1.0
MySQL 5.7.42
Collation on the database is set to 'utf8mb4_unicode_520_ci'
Connection from Python is:
db = None
db = mysql.connector.connect(
host="localhost",
user=username,
passwd=password,
database=eventdb,
charset="utf8mb4",
use_unicode=True
)
cur = None
cur = db.cursor(dictionary=True)
I have a string that comes from a json.dump and attempting to run a parameterized query with it:
data["name"] = '\udced\udca0\udcbe\udced\udcb7\udca1\n\n\udced\udca0\udcbe\udced\udcb7\udca1\n\n♡ADANA♡♡EOMON♡'
sql = "SELECT db_name_id FROM db_name WHERE name = %s"
val = (data["name"],)
curr.execute(sql_text, sql_val)
mysql-connector-python 8.0.13 on both version of Python returns UnicodeEncodeError: 'utf-8' codec can't encode characters in position 0-5: surrogates not allowed
mysql-connector-python 8.1.0 on Python.38 returns _mysql_connector.MySQLInterfaceError: Failed converting Python 'str'
However if I execute a simple query:
cur.execute(SELECT db_name_id FROM db_name WHERE name = '\udced\udca0\udcbe\udced\udcb7\udca1\n\n\udced\udca0\udcbe\udced\udcb7\udca1\n\n♡ADANA♡♡EOMON♡')
Then it executes without error, this is a user entered field though and I really DON'T want to be doing the query without parameters.
The simplest example that replicates the exception error I'm seeing is using the C Extension directly:
import _mysql_connector
ccnx = _mysql_connector.MySQL()
ccnx.connect(
host="localhost",
user="user",
password="password",
database="database"
)
bad_str = 'just_an_��_example'
try:
str_converted = ccnx.convert_to_mysql(*[bad_str])
print('str converted is %s', str_converted)
except Exception as e:
print('cant convert bad str %s',bad_str)
print(e)
I've only tested this with mysql-connector-python 8.1.0.
If I make the following change based on information MySQL Bug 99757, then the convert_to_mysql works:
import _mysql_connector
ccnx = _mysql_connector.MySQL()
ccnx.connect(
host="localhost",
user="user",
password="password",
database="database"
)
ccnx.set_character_set('utf8')
bad_str = 'just_an_��_example'
try:
str_converted = ccnx.convert_to_mysql(*[bad_str])
print('str converted is %s', str_converted)
except Exception as e:
print('cant convert bad str %s',bad_str)
print(e)
It seems like the conversion to a mysql string is broken for some cases, including parameterized strings with surrogates. I'm hoping there's just something I missed.