4

Does anyone know how to workaround a problem of OpenOffice Calc not handling new lines in cells correct?

I have a python script that is dynamically generating an excel workbook using openpyxl via pandas.

The script works fine but when I view cells in OpenOffice that contain newlines all the values are duplicated multiple times. If I open the same file using the Microsoft Excel Viewer everything is displayed correctly and if I use a character other than a new line (e.g. comma, #, etc) it displays fine in both also.

I have a workaround to go into the excel and replace the random character using a macro but would like to avoid that if possible as the process really needs to be completely automated. also because the file will be processed by an another internal tool, I do need these cells to be processed with a new line and I can't change the character.

I have also tried using chr(10) and/or chr(13) but in the former case it just get's replaced in the output by '\n' anyway as expected.

The code I'm currently using is similar to:

test_list = []
for x in range(1,18):
    test_list.append([
        "value1",
        "\n".join(['element1', 'element2', 'element3']),
        "value3"
    ])

data_df = pd.DataFrame(test_list)

fn = r'/path/to/excel/file.xlsx'

writer = pd.ExcelWriter(fn, engine='xlsxwriter')
data_df.to_excel(writer, sheet_name='Data', index=False, header=0)
workbook = writer.book
worksheet = writer.sheets['Data']
worksheet.set_column('A:ZZ',50,
                     workbook.add_format({'text_wrap': True}))
writer.save()

What happens with the Element data is that it shows in the OpenOffice Calc cell as something like:

Openoffice Cells

Oddly the last item appears to be correct

The same data viewed as a list or via DataFrame.head() appears fine:

pprint(test_list)
[['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
['value1', 'element1\nelement2\nelement3', 'value3'],
...
['value1', 'element1\nelement2\nelement3', 'value3']]

data_df.head(18):
         0                             1       2
0   value1  element1\nelement2\nelement3  value3
1   value1  element1\nelement2\nelement3  value3
2   value1  element1\nelement2\nelement3  value3
...
15  value1  element1\nelement2\nelement3  value3
16  value1  element1\nelement2\nelement3  value3

It's just when it get passed to the openpyxl library and viewed in OpenOffice.

Thanks

GuHuka
  • 43
  • 6

2 Answers2

2

The code worked fine for me using OpenOffice 4.1.2 on Windows:

enter image description here

For this screenshot, I double-clicked on the bottom of the second row to expand it. Before that, it just showed element3 with a red triangle. But that seems different from the behavior you described.

EDIT:

Ok, I can now confirm the problem. As you said, it occurs with the mysterious number of 18 items. It looks like a bug in OpenOffice, because there is not much difference in the XML files viewed by unzipping file.xlsx.

I also tried adding CR and LF directly to the XML files, but this just resulted in:

enter image description here

That leaves us with three solutions:

  1. Use LibreOffice instead, which does not have this problem (tested LO 5.1.0.3).
  2. Report the bug and wait for a new version.
  3. Use OpenOffice's preferred .ods format instead of MS Office's preferred format.
Jim K
  • 12,824
  • 2
  • 22
  • 51
  • 1
    Sorry, in removing sensitive data I'd oversimplified the code (updated original post). For a single item it seems to work fine but when processing more than 17 items, that's when it goes wrong. The original script is producing a list of 100's of items. – GuHuka Aug 02 '16 at 22:10
  • confirmed. Installed LibreOffice 5.2.0.4 and the code has successfully worked with 3000+ items. – GuHuka Aug 09 '16 at 21:45
0

When I run your example with a recent Pandas and XlsxWriter I get the expected output in Excel:

enter image description here

However, in this case Excel is automatically adjusting the height of row 2 to compensate. That may not be happening in OpenOffice.

In which case you can set it explicitly like this:

worksheet.set_row(1, 45)
jmcnamara
  • 38,196
  • 6
  • 90
  • 108