Getting strange character translations using unoconv to convert from docx/doc to pdf

Question

I am using unoconv (https://github.com/dagwieers/unoconv) to convert DOCX and DOC file to PDF, but will often get strange results on certain characters when they are rendered in the PDF.

One particular problem is numbers translating oddly for example, the section label of:

Section 2.3 (http://note.io/1Q33RX6)

Get's turned into a roman numeral:

Section II.3 (http://note.io/1b6MDs5)

I have a feeling this has to do with installed character sets but have no idea how to debug it.

The setting for the issue is a Django app making call to a unix shell script to convert a document on disk.

(Cough) I fail to see the difference. Do those links point to the exact same image? Anyway, a link to the PDF would help. Usually, rendering of a PDF does *not* depend on "installed character sets" (quoted, because it's a rather quaint statement nowadays--did you mean "font"?). If the numbers are Auto-generated rather than typed, the error lies in the converting software. — Jongware, Apr 19 '15 at 09:00
Sorry, I corrected the second link above. I used the term character set because at one I had warning messages using the term. — rkp333, Apr 19 '15 at 18:02
The numbers are probably auto-generated, either in Word or by a Word plug-in that is used by lawyers. I don't have the answer bc I did not produce the document, and unfortunately it's confidential so I can't pass it on. I just looked and the error does have to do with LibreOffice, which is used on the back end to convert the format. — rkp333, Apr 19 '15 at 18:10

score 1 · Accepted Answer · answered Jul 07 '15 at 14:56

1

unoconv simply programmatically opens the file, and then saves/exports it to the desired format. I would expect the same to happen when you open the file using LibreOffice and saving it from the GUI.

If this is the case, you may want to test using the latest LibreOffice release, and if that does not solve your issue, report the problem to the LibreOffice bug-tracker.

answered Jul 07 '15 at 14:56

Dag Wieers

1,663
13
11

Thanks for this. Yes, I appears to be an issue with Libre Office, when the doc is opened there, the same thing happens. Opening in Word works fine. – rkp333 Aug 12 '15 at 16:43

Getting strange character translations using unoconv to convert from docx/doc to pdf

1 Answers1