2

I have about 1500 excel spreadsheet files downloaded from movescount.com - these contain heart rate data taken over a few years. I want to process these files with a bit of Python + openpyxl. The problem is that opening the excel file with openpyxl fails. Specifically it complains that a styles.xml definition is missing. Also Apple’s Numbers cannot open the file. When I open the file with MS Excel (which works) and simply save the file right away, I can open and process it with python and openpyxl just fine.

Clearly, MS Excel is repairing some format issue. But I would like to avoid having to open and save the file with MS Excel - particularly because that means I need to do this on an MS Windows box. What is the best way to repair these files on the fly on my OS X machine?

pnuts
  • 58,317
  • 11
  • 87
  • 139
Yohst
  • 1,671
  • 18
  • 37
  • 1
    If you can get an hour on a windows box you may want to try [automating excel via COM](http://stackoverflow.com/a/28488725/1322401) to just open, save, and close the workbooks. (At about 2 seconds per workbook it should take 50 minutes.) – Steven Rumbalski Feb 26 '15 at 15:55
  • Have you tried [pyXLSX](https://github.com/leegao/pyXLSX) or [python-xlsx](https://github.com/staale/py-xlsx)? I believe that openpyxl is the best python Excel library, but trying the alternatives may give you something that doesn't barf on these particular files. – Steven Rumbalski Feb 26 '15 at 16:06

1 Answers1

1

An Excel xlsx file is a zipped archive of files. So you can write the missing file to the xlsx archive provided you have one good file to start with.

import zipfile

with zipfile.ZipFile('/path/to/goodfile.xlsx', 'r') as zgood:
    styles_xml = zgood.read('xl/styles.xml')

with zipfile.ZipFile('/path/to/badfile.xlsx', 'a') as zbad:
    zbad.writestr('xl/styles.xml', styles_xml)
Steven Rumbalski
  • 44,786
  • 9
  • 89
  • 119