SQL Server (SQLCMD), Python and encoding issue when using non ascii chars

Question

i'm facing an encoding issue with my python code, when asking data that are in SQL Server 2005.

(because i was unable to compile PyMSSQL-2.0.0b1) i'm using this piece of code and i am able to do some select but now i stick with the issue that i do not know what SQLCMD is output-ting to me :(

(i had to work with European language contained in table, so i had to face other encodings with accent and so on)

for example :

when i read it (select) from the Ms SQLServer Management Studio i have this country name : 'Ceská republika' (note the first a is with acute on it)
when using it from SQLCMD from command line (Powershell in Windows 7), it is still ok, i can see the "Cesk'a with acute'"
now when using Python with the os.popen trick from the recipe, that is with this connection string :

sqlcmd -U adminname -P password -S servername -d dbname /w 8192 -u

i get this string : 'Cesk\xa0 republika'

notice the \xa0 that i do know what encoding it is, and how i can pass from this \xa0 to {a with acute}...

if i test from Python, and unicode i should have this one '\xe1'

>>> unicode('Cesk\xa0 republika')

Traceback (most recent call last):
  File "<pyshell#13>", line 1, in <module>
    unicode('Cesk\xa0 republika')
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 4: ordinal not in range(128)

>>> unicode_a_with_acute = u'\N{LATIN SMALL LETTER A WITH ACUTE}'
>>> unicode_a_with_acute
u'\xe1'
>>> print unicode_a_with_acute
á
>>> print unicode_a_with_acute.encode('cp1252')
á
>>> unicode_a_with_acute.encode('cp1252')
'\xe1'
>>> print 'Cesk\xa0 republika'.decode('cp1252')
Cesk  republika
>>> print 'Cesk\xa0 republika'.decode('utf8')

Traceback (most recent call last):
  File "<pyshell#21>", line 1, in <module>
    print 'Cesk\xa0 republika'.decode('utf8')
  File "C:\Python27\lib\encodings\utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
 UnicodeDecodeError: 'utf8' codec can't decode byte 0xa0 in position 4: invalid start byte

so what SQLCMD is giving to me? How should i force it and/or os.popen and others to be sure that i have understandable utf8 for Python?

(notice, i have tried both with and without the -u ending on the os.popen cmd for SQLCMD and that should stand for asking to SQLCMD to answer in unicode, with no effect, also i have tried to feed it with a "select" python string encoded in utf8 with no more success :

 sqlstr = unicode('select * from table_pays where country_code="CZ"')
 cu = c.cursor
 lst = cu.execute(sqlstr)
 rows = cu.fetchall()
 for x in rows:
      print x

 ( 'CZ          ', 'Cesk\xa0 republika       ')

)

another point : from what i googl-ed, about "sqlcmd.exe", there are also these parameters that could may be help :

[ -f < codepage > | i: < codepage > [ < , o: < codepage > ] ]

but i was unable to specify the right one, i do not know what are the possible values, BTW using (or not using) the :

[ -u unicode output]

dit not help me also...

score 0 · Answer 1 · edited Mar 16 '12 at 15:06

The problem might be that console works in ascii mode by default and output is converted via current codepage setting. You can try the following, either write result to separate file with: -o <file> -u

Then result file will have proper ucs2 encoding, which python gladly takes. Another is to setup utf8 console output (untested):

# setup utf8 on windows console
cmode = 'mode con: codepage select=65001 > NUL & '
cmd = 'my command'
f = os.popen(cmode + cmd)
out = f.readlines()

score 0 · Answer 2 · answered Nov 14 '11 at 09:56

It looks like your default codepage is 850 or 437. Never try to guess at codepages: chcp in a command prompt will tell you what your system is set to use.

Trying to set the command processor codepage with either chcp or mode con: is unlikely to be helpful because they set the output codepage for the console not for pips or redirecting to a file.

To get unicode (or rather, utf-16) output in a pipe use cmd /u:

>>> subprocess.check_output('''cmd /u /c "echo hello\xe1"''').decode('utf16')
'helloá\r\n'
>>>

But you would almost certainly be better just to install a real database adaptor.

SQL Server (SQLCMD), Python and encoding issue when using non ascii chars

2 Answers2