I'm making a Python script which parses data files. The parsed data is then sent to an Excel file. The data can be rather huge. I'm looking at 10 to 20 columns, but the number of rows can be well over 100.000.
Writing this amount of data to Excel with win32com takes a considerable amount of time. My first step was to iterate over every cell in the Excel file, which is very time consuming. After some digging I found out how I can write a row with a single call, reducing the time needed drastically.
However, when I need to send 100.000 rows of data to Excel, it will still take a lot of time. I'm pretty sure that I can further speed this up, by sending the complete array with a single call. However I'm not able to achieve this so far.
See the code below, which demonstrates the issue. The code shows the differences in time. However, the third step, sending the complete array with a single call to a range is not resulting in the correct data in Excel. What am I doing wrong?
import win32com.client
import time
#create data array
row = range(0,10)
i = 0
data_array = []
while i < 1000:
data_array.append(row)
i += 1
#write the array to an excel file
excel = win32com.client.Dispatch("Excel.Application")
excel.Visible = True
excel.ScreenUpdating = False
book = excel.Workbooks.Add()
sh1 = book.Worksheets(1)
sh2 = book.Worksheets(2)
sh3 = book.Worksheets(3)
#double loop, writing individual cells
print "Writing with double loop to inidividual cells."
start = time.time()
row = 0
for line in data_array:
row += 1
col = 0
for field in line:
col += 1
sh1.Cells(row, col).Value = field
print "Processing time: " + str(time.time() - start) + " seconds."
#single loop, writing a row to a range
print "Writing with double loop to inidividual cells."
start = time.time()
row = 0
for line in data_array:
row += 1
sh2.Range(sh2.Cells(row,1), sh2.Cells(row, len(line))).Value = line
print "Processing time: " + str(time.time() - start) + " seconds."
#no loop, write array to range
print "Writing with double loop to inidividual cells."
start = time.time()
try:
sh3.Range(sh3.Cells(row,1), sh3.Cells(len(data_array), len(data_array[0]))).Value = data_array
print "Processing time: " + str(time.time() - start) + " seconds."
except:
print "Failed to write array to excel file."
excel.ScreenUpdating = True
sheet = None
book = None
excel.Quit()
excel = None