7

I've got a funny issue trying to insert non-ASCII characters into a SQL Server database, using the Microsoft ODBC driver for Linux. The problem is it seems to be assuming different character sets when sending and receiving data. For info, the server collation is set to Latin1_General_CI_AS (I'm only trying to insert European accent characters).

Testing with tsql (which came with FreeTDS), everything is fine. On startup, it outputs the following:

locale is "en_GB.utf8"
locale charset is "UTF-8"
using default charset "UTF-8"

I can both insert and select a non-ASCII value into a table.

However, using my own utility which uses the ODBC API, it's not working. When I do a select query, the data comes back in UTF-8 character set as desired. However if I insert UTF-8 characters, they get corrupted.

SQL > update test set a = 'Béthune';
Running SQL: update test set a = 'Béthune'
Query executed OK: 1 affected rows
SQL > select * from test;
Running SQL: select * from test
+------------+
| a          |
+------------+
| Béthune |
+------------+

If I instead insert the data encoded in ISO-8859-1, then that works correctly, however the select query will still return it encoded in UTF-8!

I've already got the locale set to en_GB.utf8, and a client charset of UTF-8 in the database connection details. Aargh!

FWIW I seem to be getting the same problem whether I use the FreeTDS driver or the official Microsoft driver.

EDIT: Just realised one relevant point, which is that in this test program, it isn't using a prepared statement with bound variables. In other words, the update SQL is passed directly into the SQLPrepare call. Something in ODBC is definitely doing an iconv translation, but evidently not to the correct character set!

#0  0x0000003d4c41f850 in iconv () from /lib64/libc.so.6
#1  0x0000003d4d83fd94 in ?? () from /usr/lib64/libodbc.so.2
#2  0x0000003d4d820465 in SQLPrepare () from /usr/lib64/libodbc.so.2

I'll try compiling my own UnixODBC to see better what's going on.

EDIT 2: I've built UnixODBC from source to debug what it's doing, and the problem is nl_langinfo(CODESET) reports back ISO-8859-1. That is strange, since the man page for it says it's the same string you get from locale charmap, which returns UTF-8. I'm guessing that's the problem but still not sure how to solve.

asc99c
  • 3,815
  • 3
  • 31
  • 54

2 Answers2

6

A colleague at work has just figured out the solution for FreeTDS at least.

For a direct driver connection (SQLDriverConnect()), adding ClientCharset=UTF-8;ServerCharset=CP1252; to the connection string fixed the problem

For a connection via the driver manager (SQLConnect()), I can add these lines to the connection settings in odbc.ini:

client charset = UTF-8
server charset = CP1252

Can't yet figure out a solution using the Microsoft driver ...

asc99c
  • 3,815
  • 3
  • 31
  • 54
  • Thanks to your post I was able to use the correct DSN connection string in PHP/Ubuntu/MSSQL/FreeTDS/PDO/ODBC using: "odbc:DRIVER=FreeTDS;SERVER=serverhostname\SQLEXPRESS;DATABASE=testdb;ClientCharset=WE8ISO8859P1" – Jan Jul 13 '17 at 08:25
  • Still can't get it to work with the latest Microsoft ODBC driver for MacOS/Linux. The debugging output, or error messages from the driver manager show as: Warning: QODBCResult::exec: Unable to execute statement: "[Microsoft][ODBC Driver 13 for SQL Server][SQL Ser\u0000\u0000\u0000\u0000\u0002\u0000뜠\u0BFF\u0001\u0000\u0000\u0000\u0000㿰\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000㿰\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000\u0000㿰\u0000\u0000[\u0000\u0000" – Paul-Sebastian Manole Dec 29 '17 at 22:47
4

A solution for Microsoft ODBC Driver might be to set a proper value into the LANG environment variable.

Make sure you have your required locale installed and configured. Also make sure that the LANG environment variable is set correctly for the user you are running your application under. This might be tricky for daemons. For example to make it work for PHP with Apache2 I had to add export LANG=en_US.utf8 into /etc/apache2/envvars.

P-39 Airacobra
  • 229
  • 3
  • 9
  • Note that if the system doesn't have a locale with a charset you need you can probably define it. Use `locale -m` to list charsets. Use this to define locale with the charset: `localedef -f CP1250 -i pl_PL /usr/lib/locale/pl_PL.cp1250`. – Nux Oct 13 '20 at 18:50
  • Adding export LANG=en_US.utf8 to envvars solves my problem, thanks – Ganj Khani Aug 28 '23 at 11:59