1

I am working on a Java 8 project where I need to modify a Word document template (.docx) using Apache POI v4.1.2. The template contains multiple sections with tables, paragraphs, and images. My task is to delete certain sections based on specific criteria and then update the content with new details to generate a final report.

I have successfully implemented the section deletion functionality. However, after deleting sections, the page numbers in the template change, and these changes are not reflected in the Table of Contents (TOC). I need assistance in programmatically updating the page numbers in the TOC with complete automation.

I have already tried xwpfDocument.enforceUpdateFields(), but it resulted in a popup on the document open, which is unacceptable to the stakeholders. Therefore, I'm looking for a programmatic solution to either update the stale TOC page numbers with the new ones Without any popup or create a new TOC or TOC Like structure with Section and Subsection headings with the page numbers.

Additionally, I have a constraint on using Apache POI. Changing the library at this point may not be a feasible solution as most of the logic is already written and working as expected. Also, I can not use Macro based approach due to security concerns.

Could anyone guide how to achieve this automated update or addition of TOC page numbers using Java and Apache POI? Any code snippets, suggestions, or any alternative approaches/hacks would be greatly appreciated.

Thank you in advance for your help!

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
New2Java
  • 273
  • 1
  • 4
  • 19
  • 2
    Does this answer your question? [How to update table of content (TOC) for docx file by apache poi](https://stackoverflow.com/questions/46408097/how-to-update-table-of-content-toc-for-docx-file-by-apache-poi) – Sharon Ben Asher Jun 11 '23 at 06:40

1 Answers1

1

About looking for a canonical answer

Me not member of the Apache POI developer team but have bothered with Apache POI a long time. So I believe, I know something about it.

Short answer

Using Apache POI it is not possible to update table of content (TOC) of an XWPFDokument up to now. And I doubt that it will be possible later, except Apache POI will decide to program a renderer for documents.

A table of content consists of a list of headings (paragraphs having heading style) pointing to the page, that heading is placed. And that is the problem. To know on what page a paragraph gets placed, the document needs to be renderd. From file storages point of view, a document consists of an endlies stream of body elements. There may be explicit page breaks, but there need not. If not, only the renderer can determine on which page a body element gets placed. That is dependig on page size, page margins, font size/s, possible explicit row breaks, paragraph spacing/s and much more things.

Apache POI is only to create the Office Open XML files as Microsoft Office would store it. It does not provide renderers up to now. A little exception is XSLF (PowerPoint presentations). There someone has programmed a picture export of slides, which also needs a renderer for slides. But rendering slides is much more simple than rendering whole wordprocessing documents.

Going in detail

XWPFDocument provides XWPFDocument.createTOC but no methods to update a TOC. Not even a getter to get the TOC from the document is provided.

That easily could be changed by extending XWPFDocument. But what to do having the TOC then?

Looking into source code of TOC.java, we find method public void addRow(int level, String title, int page, String bookmarkRef). That method is to add a row to the table of content where the int page should give the page, the title is placed on.

Looking into the source code of XWPFDocument.java - createTOC, we find

...
toc.addRow(level, par.getText(), 1, "112723803");
...

That means a 1 gets set for each int page in each row of the table of content. Why? Well, because Apache POI cannot determine on what exact page the found paragraph having heading style is placed. Why? Well, because Apache POI cannot rendering the document. So it sets 1 and delegates the updating the TOC to Microsoft Word, as Microsoft Word will rendering the document while opening.

Conclusion

Using Apache POI it is not possible to update table of content (TOC) of an XWPFDokument up to now.

Axel Richter
  • 56,077
  • 6
  • 60
  • 87
  • Thanks Axel. I have used createTOC() as well and found out the same behaviour as you have mentioned. For all the headings, it just sets the page number 1 and also it does not give me an option to decide which page I want the TOC to appear. Is there any other alternative that I can try? Any help would be appreciated – New2Java Jun 22 '23 at 05:39
  • @New2Java: You have understood that the main problem is the missed renderer? If you would know on what page a special paragraph gets rendered by Word, then you could simply extend `XWPFDocument` providing `TOC` updating by setting that found page number in call of `toc.addRow(level, par.getText(), 1, "112723803");` instead of the `1`. – Axel Richter Jun 22 '23 at 05:45
  • Yes, I understand the point. Page numbers in Word are dynamic and are calculated at render time based on a number of factors, including the current page size, font size, line spacing, margins, and more. And it can vary based on the rendering engine. I think Microsoft word might have a complex algorithm to determine all these things. I am wondering how other people with similar issue have solved their problem of TOC update though!! – New2Java Jun 22 '23 at 06:00
  • @New2Java: "I am wondering how other people with similar issue have solved their problem of TOC update though.": By using `enforceUpdateFields` to delegate that to Word? But that you have ruled out. Or by using software providing a document renderer? Aspose.Words claims to have a such. But it also has problems while updating TOC, as far as I know. And you have ruled out that too. – Axel Richter Jun 22 '23 at 06:09