I am connecting to a MS SQL server using
pyodbc. Furthermore, I am trying to write to an Excel 2007/10
.xlsx file using
This is my code (Python 2.7):
import pyodbc from openpyxl import Workbook cnxn = pyodbc.connect(host = 'xxx',database='yyy',user='zzz',password='ppp') cursor = cnxn.cursor() sql = "SELECT TOP 10 [customer clientcode] AS Customer, \ [customer dchl] AS DChl, \ [customer name] AS Name, \ ... [name3] AS [name 3] \ FROM mydb \ WHERE [customer dchl] = '03' \ ORDER BY [customer id] ASC" #load data cursor.execute(sql) #get colnames from openpyxl columns = [column for column in cursor.description] #using optimized_write cause it will be about 120k rows of data wb = Workbook(optimized_write = True, encoding='utf-8') ws = wb.create_sheet() ws.title = '03' #append column names to header ws.append(columns) #append rows to for row in cursor: ws.append(row) wb.save(filename = 'test.xlsx') cnxn.close()
This works, at least up until the point I encounter a customer with, for example, the name:
"mún". My code does not fail, everything writes to Excel and all is well. That is until I actually open the Excel file- this causes an error saying that the file is corrupted and needs to be repaired. Upon repairing the file, all data is lost.
I know the code works for customers with regular names (only ASCII), it is as soon as there’s an accented character or anything that the Excel file gets corrupted.
I have tried to print a single row (with a difficult cust. name). This is the result:
row is a tuple, and this one of the indices:
'Mee\xf9s Tilburg' So either writing the
\xf9 (ú) character causes an error, or MS Excel cannot cope with it. I have tried various ways of encoding a row to unicode (
u''.join(row)) etc., though nothing works. Either I try something idiotic resulting in an error, or the Excel file still errors.
In the end I found two solutions:
First one was converting the row given by the cursor to a list, and decoding the elements within the list:
for row in cursor: l = list(row) l = l.decode('ISO-8859-1') (do this for all neccesary cols) ws.append(l)
I figured this would have been hell, cause there were 6 columns needing conversion to unicode, and there were 120k rows, though everything went quite fast actually! In the end it became apparent that I could/should just cast the data in the sql statement to unicode ( cast(x as nvarchar) AS y) which made the replacements unnecessary. I did not think of this at first cause i thought that it was actually supplying the data in unicode. My bad.
You can use
encode() to convert unicode to string: