7

I am connecting to a MS SQL server using pyodbc. Furthermore, I am trying to write to an Excel 2007/10 .xlsx file using openpyxl.

This is my code (Python 2.7):

import pyodbc
from openpyxl import Workbook

cnxn = pyodbc.connect(host = 'xxx',database='yyy',user='zzz',password='ppp')
cursor = cnxn.cursor()

sql = "SELECT TOP 10   [customer clientcode] AS Customer, \
                [customer dchl] AS DChl, \
                [customer name] AS Name, \
                ...
                [name3] AS [name 3] \
        FROM   mydb \
        WHERE [customer dchl] = '03' \
        ORDER BY [customer id] ASC"

#load data
cursor.execute(sql)

#get colnames from openpyxl
columns = [column[0] for column in cursor.description]    

#using optimized_write cause it will be about 120k rows of data
wb = Workbook(optimized_write = True, encoding='utf-8')

ws = wb.create_sheet()
ws.title = '03'

#append column names to header
ws.append(columns)

#append rows to 
for row in cursor:
    ws.append(row)

wb.save(filename = 'test.xlsx')

cnxn.close()

This works, at least up until the point I encounter a customer with, for example, the name: "mún". My code does not fail, everything writes to Excel and all is well. That is until I actually open the Excel file- this causes an error saying that the file is corrupted and needs to be repaired. Upon repairing the file, all data is lost.

I know the code works for customers with regular names (only ASCII), it is as soon as there's an accented character or anything that the Excel file gets corrupted.

I have tried to print a single row (with a difficult cust. name). This is the result:

row is a tuple, and this one of the indices: 'Mee\xf9s Tilburg' So either writing the \xf9 (ú) character causes an error, or MS Excel cannot cope with it. I have tried various ways of encoding a row to unicode (unicode(row,'utf-8') or u''.join(row)) etc., though nothing works. Either I try something idiotic resulting in an error, or the Excel file still errors.

Any ideas?

David Cain
  • 16,484
  • 14
  • 65
  • 75
Rym
  • 143
  • 1
  • 7
  • The connection string might seem strange, as I also tried testing different ways to the server, this instance it was pmssql. But my issue is not with connecting! – Rym Mar 08 '13 at 15:05
  • Not an exact duplicate to your problem, but you might find a solution here: http://stackoverflow.com/questions/9148221/reading-unicode-from-sqlite-db-using-python – David Marx Mar 08 '13 at 15:41
  • Unable to reproduce with `pyodbc 3.0.6` and `openpyxl 1.6.1`. `mún` is encoded as `u'm\xfan'` in cursor. – Bryan Mar 11 '13 at 12:45
  • Strange, I also got 'm\xfan' though without the u, which caused all sorts of issues. Casting my data to nvarchar resolved this problem (where I would now just see u'mún'). Thank you for looking at it though – Rym Mar 11 '13 at 14:41

2 Answers2

5

In the end I found two solutions:

First one was converting the row given by the cursor to a list, and decoding the elements within the list:

for row in cursor:
    l = list(row)
    l[5] = l[5].decode('ISO-8859-1')
    (do this for all neccesary cols)
    ws.append(l)

I figured this would have been hell, cause there were 6 columns needing conversion to unicode, and there were 120k rows, though everything went quite fast actually! In the end it became apparent that I could/should just cast the data in the sql statement to unicode ( cast(x as nvarchar) AS y) which made the replacements unnecessary. I did not think of this at first cause i thought that it was actually supplying the data in unicode. My bad.

Rym
  • 143
  • 1
  • 7
  • Your question included an example with `\xf9` in it, which is not valid UTF-8. Too bad it took you so long to realize it. – Mark Ransom Mar 11 '13 at 14:45
-1

You can use encode() to convert unicode to string:

l=[u'asd',u'qw',u'fdf',u'sad',u'sadasd']
l[4]=l[4].encode('utf8')
Saurabh
  • 7,525
  • 4
  • 45
  • 46