1

I am generating an excel file and returning it for download in a flask app. If I use 'in_memory': True then the file is corrupt and excel can't open it. The same code if written directly to disk works. For the in-memory code I'm following this example from the XlsxWriter docs.

I was skeptical that the bytes were different, so I wrote a small script to test, and they do differ slightly (44 bytes out of 5730).

Here's my script generating identical workbooks, one in memory and one on disk. It then compares the bytes and finds that they are different. Why?

from io import BytesIO

from xlsxwriter import Workbook


def fill_workbook(workbook):
    """Populate the workbook with some test data"""
    first_sheet = workbook.add_worksheet("First")
    first_sheet.write(0, 0, "test")

    next_sheet = workbook.add_worksheet("Next")
    next_sheet.write(0, 0, "sample")
    next_sheet.write(0, 1, "value")
    workbook.close()


def get_bytes():
    """Get the bytes for the in-memory and on-disk workbooks"""
    output = BytesIO()
    in_mem = Workbook(output, {'in_memory': True})

    filename = "direct.xlsx"
    on_disk = Workbook(filename)

    fill_workbook(in_mem)
    fill_workbook(on_disk)

    output.seek(0)
    mem_bytes = output.read()

    with open(filename, "rb") as f:
        disk_bytes = f.read()

    return mem_bytes, disk_bytes


def compare_bytes():
    """Compare the bytes of the two workbooks"""
    mem_bytes, disk_bytes = get_bytes()

    print(mem_bytes == disk_bytes)

    same = 0
    diff = 0
    for mb, db in zip(mem_bytes, disk_bytes):
        if mb == db:
            same += 1
        else:
            diff +=1

    print(f"{same} bytes same")
    print(f"{diff} bytes different")


if __name__ == '__main__':
    compare_bytes()

I ran my script on Python 3.7.3 with XlsxWriter==1.2.8

Eric Grunzke
  • 1,487
  • 15
  • 21
  • 1
    Any two XlsxWriter files created more that a second (or possibly a millisecond) apart will be different due to a timestamp in one of the Excel metadata files. To be sure the files are binary identical you should set the “created” date via the `set_properties()` workbook method: https://xlsxwriter.readthedocs.io/workbook.html#workbook-set-properties – jmcnamara Mar 11 '20 at 01:02

1 Answers1

1

It works for me when in_memory is set to False. Let's see what the in_memory argument is actually doing in the source code of XlsxWriter:

workbook.py on Github

    for file_id, file_data in enumerate(xml_files):
        os_filename, xml_filename, is_binary = file_data

        if self.in_memory:

            # Set sub-file timestamp to Excel's timestamp of 1/1/1980.
            zipinfo = ZipInfo(xml_filename, (1980, 1, 1, 0, 0, 0))

            # Copy compression type from parent ZipFile.
            zipinfo.compress_type = xlsx_file.compression

            if is_binary:
                xlsx_file.writestr(zipinfo, os_filename.getvalue())
            else:
                xlsx_file.writestr(zipinfo,
                                   os_filename.getvalue().encode('utf-8'))`

        else:
            # The sub-files are tempfiles on disk, i.e, not in memory.

            # Set sub-file timestamp to 31/1/1980 due to portability
            # issues setting it to Excel's timestamp of 1/1/1980.
            timestamp = time.mktime((1980, 1, 31, 0, 0, 0, 0, 0, -1))
            os.utime(os_filename, (timestamp, timestamp))

            try:
                xlsx_file.write(os_filename, xml_filename)
                os.remove(os_filename)
scenox
  • 698
  • 7
  • 17