0

I am trying to create a table in a DOCX file and then convert it to a PDF using Apache POI (version 5.2.3) and the XWPF Converter (version 2.0.4) library. I have successfully created the table and merged cells in the DOCX file. However, when I convert the DOCX file to PDF using the XWPF Converter, the resulting PDF does not have the proper formatting.

ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); PdfOptions options = PdfOptions.create(); PdfConverter.getInstance().convert(document, byteArrayOutputStream, options); byte[] pdfBytes = byteArrayOutputStream.toByteArray();

Expected result: I expect the converted PDF to maintain the table formatting and cell merging as it appears in the original DOCX file.

Actual result: The converted PDF does not accurately reflect the formatting of the table and merged cells.

sinu gaud
  • 11
  • 2
  • 2
    There is no version 5.0.3 of Apache POI. Where did you get this version from? And please provide a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example). Based on the current information in your question, I doubt anyone can reproduce the issue. – Axel Richter Jul 05 '23 at 12:04
  • No, There was typing mistake sorry for that I am using 5.2.3 poi version – sinu gaud Jul 06 '23 at 05:34
  • 2
    OK. But nevertheless a complete reproducible example is needed. Take https://stackoverflow.com/questions/51440312/docx-to-pdf-converter-in-java/51440649#51440649 for example and show how the source `WordDocument.docx` looks like. Also show how the resulting `WordDocument.pdf` looks like after conversion. – Axel Richter Jul 06 '23 at 05:59
  • Yes I read https://stackoverflow.com/questions/51440312/docx-to-pdf-converter-in-java/51440649#51440649 but it is not useful for me I add example of both file please look into this https://drive.google.com/drive/folders/1oIFIxoMjgxrfu-GH0QtvVzyhdVaPbZIC?usp=drive_link you have to download docx file and open into word – sinu gaud Jul 06 '23 at 07:26

1 Answers1

1

The programmers of XDocReport have done a great job to handle the really complex file structure of a Microsoft Word *.docx document in Office Open XML format. But, of course, there always are not solved problems.

When it comes to tables in Word, then following problems are known to me:

A Word table might have row heights not set explicitly and so only determined by content. Then XDocReport not calculates the height considering the font descenders.

A Word table might have table cells hidden using gridBefore and wBefore for cells before the first cell in row and/or gridAfter and wAfter for cells after the last cell in row. Such cells are not part of the rows then and also are not set via cell merging. This is something what XDocReport not considers. And because of the missed cells, the whole table structure gets damaged.

A Word table might have set alternating row background through table style. This is something what XDocReport not considers.

There might be more. But I doubt there is any free software out which really considers all of the complex possibilities of a Microsoft Word document. Even commercial software, except Microsoft Word itself, will have issues there.

Following short complete Java program can be used to test:

import java.io.*;
import java.math.BigInteger;

//needed jars: fr.opensagres.poi.xwpf.converter.core-2.0.4.jar, 
//             fr.opensagres.poi.xwpf.converter.pdf-2.0.4.jar,
//             fr.opensagres.xdocreport.itext.extension-2.0.4.jar,
//             itext-4.2.1.jar                                   
import fr.opensagres.poi.xwpf.converter.pdf.PdfOptions;
import fr.opensagres.poi.xwpf.converter.pdf.PdfConverter;

//needed jars: apache poi 5.2.3 and it's dependencies
//             and additionally: poi-ooxml-full-5.2.3.jar 
import org.apache.poi.xwpf.usermodel.*;

public class XWPFToPDFConverterSampleMin {

 public static void main(String[] args) throws Exception {

  String docPath = "./XWPFDocument.docx";
  String outputFile = "./XWPFDocument.pdf";

  InputStream in = new FileInputStream(new File(docPath));
  XWPFDocument document = new XWPFDocument(in);

  PdfOptions options = PdfOptions.create();
  OutputStream out = new FileOutputStream(outputFile);
  PdfConverter.getInstance().convert(document, out, options);

  document.close();
  out.close(); 

 }
}

The XWPFDocument.docx looks like so:

enter image description here

The resulting XWPFDocument.pdf looks like so:

enter image description here

Axel Richter
  • 56,077
  • 6
  • 60
  • 87
  • In my case I am writing some data in docx. after that I am try to creating table , but we create table on .docx that time no issue in converting in pdf. – sinu gaud Jul 06 '23 at 12:23