I've got a funny issue trying to insert non-ASCII characters into a SQL Server database, using the Microsoft ODBC driver for Linux. The problem is it seems to be assuming different character sets when sending and receiving data. For info, the server collation is set to Latin1_General_CI_AS (I'm only trying to insert European accent characters).
Testing with tsql (which came with FreeTDS), everything is fine. On startup, it outputs the following:
locale is "en_GB.utf8"
locale charset is "UTF-8"
using default charset "UTF-8"
I can both insert and select a non-ASCII value into a table.
However, using my own utility which uses the ODBC API, it's not working. When I do a select query, the data comes back in UTF-8 character set as desired. However if I insert UTF-8 characters, they get corrupted.
SQL > update test set a = 'Béthune';
Running SQL: update test set a = 'Béthune'
Query executed OK: 1 affected rows
SQL > select * from test;
Running SQL: select * from test
+------------+
| a |
+------------+
| Béthune |
+------------+
If I instead insert the data encoded in ISO-8859-1, then that works correctly, however the select query will still return it encoded in UTF-8!
I've already got the locale set to en_GB.utf8
, and a client charset
of UTF-8 in the database connection details. Aargh!
FWIW I seem to be getting the same problem whether I use the FreeTDS driver or the official Microsoft driver.
EDIT: Just realised one relevant point, which is that in this test program, it isn't using a prepared statement with bound variables. In other words, the update SQL is passed directly into the SQLPrepare
call. Something in ODBC is definitely doing an iconv
translation, but evidently not to the correct character set!
#0 0x0000003d4c41f850 in iconv () from /lib64/libc.so.6
#1 0x0000003d4d83fd94 in ?? () from /usr/lib64/libodbc.so.2
#2 0x0000003d4d820465 in SQLPrepare () from /usr/lib64/libodbc.so.2
I'll try compiling my own UnixODBC to see better what's going on.
EDIT 2: I've built UnixODBC from source to debug what it's doing, and the problem is nl_langinfo(CODESET)
reports back ISO-8859-1
. That is strange, since the man page for it says it's the same string you get from locale charmap
, which returns UTF-8
. I'm guessing that's the problem but still not sure how to solve.