Python, Django and pyodbc: invalid characters

Question

I am connecting to a MS SQL server database using pyodbc. The error im getting is the following

invalid byte sequence for encoding "UTF8": 0x93
HINT:  This error can also happen if the byte sequence does not match the  
encoding expected by the server, which is controlled by "client_encoding".

The SQL database is encoded using Latin1 and I am using postgres with django, which expects UTF8.

I am very new to using pyodbc and cannot solve this problem. i have attempted to filter through piles of google searches but with no luck. Some help would be greatly appreciated.

EDIT

The Postgres db is the main db for the project. I want to be able to pull data from the SQL Server. This process will not be done often though...

The point at which the error occurs is from the read from the SQL Server db

It may be helpful to elaborate a bit on how you're using these two databases together. — dgel, Mar 22 '11 at 10:07
Are you running django-pyodbc or just pyodbc? A traceback would also be helpful. — sunn0, Mar 22 '11 at 22:25
im running just pyodbc... This process is going to be a one off and thought it was not worth installing another app jsut for one process — neolaser, Mar 22 '11 at 22:34
If you want a sensible answer, not just guesses, show the full traceback and error message, and answer the previous question about how you are using SQL Server and postgres together. WHAT is blowing up: read SQL Server? write postgres? something else?? — John Machin, Mar 22 '11 at 23:00
Thanks John, I thought I did answer how im using it. Postgres is the database that django knows of. I am using pyodbc to connect to a SQL Server database. Ill edit the question for "WHAT" is blowing up — neolaser, Mar 22 '11 at 23:11

score 3 · Accepted Answer · answered Mar 22 '11 at 22:53

3

You have given next to no clues but a reasonable guess is:

You need to decode your MS SQL Server data to unicode using the correct encoding, and (not necessarily immediately) encode it as 'UTF-8' for transmission to postgres.

What makes you think that the encoding used on the SQL Server db is latin1 and not cp125x? True latin1 on an MS product is highly unlikely. Your errant byte '\x93' when decoded as cp1252 (the usual suspect) gives U+201C LEFT DOUBLE QUOTATION MARK, commonly used in e.g. MS Word and commonly found pasted into data which ends up in a db. Decoding as latin1 produces U+0093 which is some arcane control character whose usage in practice is as rare as hens' teeth.

answered Mar 22 '11 at 22:53

John Machin

81,303
11
141
189

Thanks for the answer. I ran SELECT SERVERPROPERTY('Collation') on the SQL Server db and it returned Latin1_General_CI_AS – neolaser Mar 22 '11 at 23:09
"collation" means sort order. Your database could for example have data encoded in `cp1252` (which supports multiple Western European languages) and any one of several collations whose names start with "Latin1" ... see e.g. http://msdn.microsoft.com/en-us/library/ms143508.aspx – John Machin Mar 23 '11 at 00:09

Python, Django and pyodbc: invalid characters

1 Answers1