Writing unicode strings to Excel 2007

Question

I am connecting to a MS SQL server using pyodbc. Furthermore, I am trying to write to an Excel 2007/10 .xlsx file using openpyxl.

This is my code (Python 2.7):

import pyodbc
from openpyxl import Workbook

cnxn = pyodbc.connect(host = 'xxx',database='yyy',user='zzz',password='ppp')
cursor = cnxn.cursor()

sql = "SELECT TOP 10   [customer clientcode] AS Customer, \
                [customer dchl] AS DChl, \
                [customer name] AS Name, \
                ...
                [name3] AS [name 3] \
        FROM   mydb \
        WHERE [customer dchl] = '03' \
        ORDER BY [customer id] ASC"

#load data
cursor.execute(sql)

#get colnames from openpyxl
columns = [column[0] for column in cursor.description]    

#using optimized_write cause it will be about 120k rows of data
wb = Workbook(optimized_write = True, encoding='utf-8')

ws = wb.create_sheet()
ws.title = '03'

#append column names to header
ws.append(columns)

#append rows to 
for row in cursor:
    ws.append(row)

wb.save(filename = 'test.xlsx')

cnxn.close()

This works, at least up until the point I encounter a customer with, for example, the name: "mún". My code does not fail, everything writes to Excel and all is well. That is until I actually open the Excel file- this causes an error saying that the file is corrupted and needs to be repaired. Upon repairing the file, all data is lost.

I know the code works for customers with regular names (only ASCII), it is as soon as there's an accented character or anything that the Excel file gets corrupted.

I have tried to print a single row (with a difficult cust. name). This is the result:

row is a tuple, and this one of the indices: 'Mee\xf9s Tilburg' So either writing the \xf9 (ú) character causes an error, or MS Excel cannot cope with it. I have tried various ways of encoding a row to unicode (unicode(row,'utf-8') or u''.join(row)) etc., though nothing works. Either I try something idiotic resulting in an error, or the Excel file still errors.

Any ideas?

The connection string might seem strange, as I also tried testing different ways to the server, this instance it was pmssql. But my issue is not with connecting! — Rym, Mar 08 '13 at 15:05
Not an exact duplicate to your problem, but you might find a solution here: http://stackoverflow.com/questions/9148221/reading-unicode-from-sqlite-db-using-python — David Marx, Mar 08 '13 at 15:41
Unable to reproduce with `pyodbc 3.0.6` and `openpyxl 1.6.1`. `mún` is encoded as `u'm\xfan'` in cursor. — Bryan, Mar 11 '13 at 12:45
Strange, I also got 'm\xfan' though without the u, which caused all sorts of issues. Casting my data to nvarchar resolved this problem (where I would now just see u'mún'). Thank you for looking at it though — Rym, Mar 11 '13 at 14:41

score 5 · Accepted Answer · answered Mar 11 '13 at 14:40

In the end I found two solutions:

First one was converting the row given by the cursor to a list, and decoding the elements within the list:

for row in cursor:
    l = list(row)
    l[5] = l[5].decode('ISO-8859-1')
    (do this for all neccesary cols)
    ws.append(l)

I figured this would have been hell, cause there were 6 columns needing conversion to unicode, and there were 120k rows, though everything went quite fast actually! In the end it became apparent that I could/should just cast the data in the sql statement to unicode ( cast(x as nvarchar) AS y) which made the replacements unnecessary. I did not think of this at first cause i thought that it was actually supplying the data in unicode. My bad.

Your question included an example with `\xf9` in it, which is not valid UTF-8. Too bad it took you so long to realize it. — Mark Ransom, Mar 11 '13 at 14:45

Saurabh · Answer 2 · 2015-12-14T07:26:39.860

-1

You can use encode() to convert unicode to string:

l=[u'asd',u'qw',u'fdf',u'sad',u'sadasd']
l[4]=l[4].encode('utf8')

edited Dec 14 '15 at 07:26

answered Dec 11 '15 at 10:13

Saurabh

7,525
4
45
46

it doesn't provide a good answer. Please explain your code, and fix your spelling mistakes – Louis Barranqueiro Dec 12 '15 at 13:41

Writing unicode strings to Excel 2007

2 Answers2