0

I have a weird issue for several years now. Here's the thing.

I run Rocky Linux (happens also in CentOS), running Apache 2.4.53 wiith MariaDB (mysql Ver 8.0.30 for Linux on x86_64 (Source distribution)

I have a Tcl script which executes a "curl" to retrieve data from another site. It comes in JSON format which I then parse (using the JSON package). I then insert data into a database, such as:

insert into table set name='Mário Flores';

As you can see there is an UTF-8 character (á). I have the database in utf8mb4 charset, everything is correctly set, the locale in the system is "en_US.UTF-8".

Now... if I have the script run in my Linux box, there are no issues. If I use my website, I click on a button which does a POST to my webserver (index.cgi) and I get an error:

Error: mysqlexec/db server: Incorrect string value: '\xE1rio...' for column 'name' at row 1

and that will then run the "curl" to get the data, parse the JSON and insert into the database. The code is the same, called the same way.

What could be the issue here? I can only solve the problem if, when run by web I do:

set name [encoding convertto utf-8 $name]

And then insert into the DB.

Tried both in Linux or via web, with different results. Expected everything being already UTF-8 compatible and no conversion needed

  • `mysql Ver 8.0.30 for Linux` looks like a client version while you mention MariaDB. If its really MariaDB include the MariaDB version `select version()`. The general problem is the tcl needs to connect using a utf8mb4 character set as the connection options in some way. `set names utf8` as sql maybe. – danblack Apr 17 '23 at 00:26

1 Answers1

1

\xE1 sounds like latin1, definitely not utf8. Then connecting, set the charset encoding of the client. Alternatively, use SET NAMES latin1; after connecting.

E1 is the hex for á in any of these: cp1250, dec8, latin1, latin2, latin5.

C3A1 is the next in utf8 / utf8mb4.

As to "whether the data in the DB is..."...

  • Using utf8mb4 in the database allows all character sets of the world, including Emoji, to be represented.
  • With the correct configuration, MySQL is happy to convert to/from UTF-8 when INSERTing/SELECTing. The target charset (in the client) can be essentially any encoding. Latin1 is common; it has about 120 extra characters (accented letters and common symbols) in addition to ordinary ASCII letters, digits, and simple punctuation.

The column definitions control what is stored in the database.

The connection parameters specify what the client's charset is.

Rick James
  • 135,179
  • 13
  • 127
  • 222
  • The real question is whether the data in the DB is UTF-8 or Latin-1. That matters because it says whether the problem is in the insertion or the extraction. (The Tcl side would probably default to UTF-8 if it can't detect to do otherwise. Web code is more often Latin-1.) – Donal Fellows Apr 17 '23 at 08:42
  • @DonalFellows - I added more. – Rick James Apr 17 '23 at 15:27