openoffice calc - newline causes duplicate value in cells (pandas/openpyxl)

Question

Does anyone know how to workaround a problem of OpenOffice Calc not handling new lines in cells correct?

I have a python script that is dynamically generating an excel workbook using openpyxl via pandas.

The script works fine but when I view cells in OpenOffice that contain newlines all the values are duplicated multiple times. If I open the same file using the Microsoft Excel Viewer everything is displayed correctly and if I use a character other than a new line (e.g. comma, #, etc) it displays fine in both also.

I have a workaround to go into the excel and replace the random character using a macro but would like to avoid that if possible as the process really needs to be completely automated. also because the file will be processed by an another internal tool, I do need these cells to be processed with a new line and I can't change the character.

I have also tried using chr(10) and/or chr(13) but in the former case it just get's replaced in the output by '\n' anyway as expected.

The code I'm currently using is similar to:

test_list = []
for x in range(1,18):
    test_list.append([
        "value1",
        "\n".join(['element1', 'element2', 'element3']),
        "value3"
    ])

data_df = pd.DataFrame(test_list)

fn = r'/path/to/excel/file.xlsx'

writer = pd.ExcelWriter(fn, engine='xlsxwriter')
data_df.to_excel(writer, sheet_name='Data', index=False, header=0)
workbook = writer.book
worksheet = writer.sheets['Data']
worksheet.set_column('A:ZZ',50,
                     workbook.add_format({'text_wrap': True}))
writer.save()

What happens with the Element data is that it shows in the OpenOffice Calc cell as something like:

Oddly the last item appears to be correct

The same data viewed as a list or via DataFrame.head() appears fine:

pprint(test_list)
[['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
...
['value1', 'element1\nelement2\nelement3', 'value3']]

data_df.head(18):
         0                             1       2
0   value1  element1\nelement2\nelement3  value3
1   value1  element1\nelement2\nelement3  value3
2   value1  element1\nelement2\nelement3  value3
...
15  value1  element1\nelement2\nelement3  value3
16  value1  element1\nelement2\nelement3  value3

It's just when it get passed to the openpyxl library and viewed in OpenOffice.

Thanks

WIndows 10, Just the Excel Viewer but as said worked fine in Excel just not OpenOffice (4.1.2) — GuHuka, Aug 02 '16 at 21:10

Jim K · Accepted Answer · 2016-08-03T04:21:11.877

2

The code worked fine for me using OpenOffice 4.1.2 on Windows:

For this screenshot, I double-clicked on the bottom of the second row to expand it. Before that, it just showed element3 with a red triangle. But that seems different from the behavior you described.

EDIT:

Ok, I can now confirm the problem. As you said, it occurs with the mysterious number of 18 items. It looks like a bug in OpenOffice, because there is not much difference in the XML files viewed by unzipping file.xlsx.

I also tried adding CR and LF directly to the XML files, but this just resulted in:

That leaves us with three solutions:

Use LibreOffice instead, which does not have this problem (tested LO 5.1.0.3).
Report the bug and wait for a new version.
Use OpenOffice's preferred .ods format instead of MS Office's preferred format.

edited Aug 03 '16 at 04:21

answered Aug 02 '16 at 17:38

Jim K

12,824
2
22
51

1

Sorry, in removing sensitive data I'd oversimplified the code (updated original post). For a single item it seems to work fine but when processing more than 17 items, that's when it goes wrong. The original script is producing a list of 100's of items. – GuHuka Aug 02 '16 at 22:10
confirmed. Installed LibreOffice 5.2.0.4 and the code has successfully worked with 3000+ items. – GuHuka Aug 09 '16 at 21:45

score 0 · Answer 2 · answered Aug 02 '16 at 15:55

0

When I run your example with a recent Pandas and XlsxWriter I get the expected output in Excel:

However, in this case Excel is automatically adjusting the height of row 2 to compensate. That may not be happening in OpenOffice.

In which case you can set it explicitly like this:

worksheet.set_row(1, 45)

answered Aug 02 '16 at 15:55

jmcnamara

38,196
6
90
108

The issue was not in Excel, only viewing in OpenOffice Calc – GuHuka Aug 02 '16 at 22:10

openoffice calc - newline causes duplicate value in cells (pandas/openpyxl)

2 Answers2