0

I've done a simple replace with DOCX4J, thanks to this awesome guide.

But now I'm trying to do something more complicated.

What I'd like to do is find my marker text #1 within the document, find my marker text #2 within the document, and copy EVERYTHING inbetween the two. I will then be pasting that content X number of times and doing further alterations.

Does anyone know how I would do this, and possibly point me to the key functions needed?

Doug
  • 6,446
  • 9
  • 74
  • 107

1 Answers1

1

In the general case, that's not a simple thing to do, because there could be a variety of structures between your two markers which demand special handling (think images, footnotes, sectPr elements, bookmarks etc). Regarding that general case, see my blog post on MergeDocx.

However, if you can make some simplifying assumptions, then it becomes easier.

First, assume your markers are block level elements.

Second, assume your document is just formatted text and tables.

Then you can just perform operations on the list of block level content:

MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();
List<Object> blocks = documentPart.getContent();

There is XmlUtils.deepCopy to clone objects as necessary.

For each structure which contravenes assumption 2, you'll need specific handling. If you have control over your input documents, you'll be able to manage this.

As an alternative to marker #1 and #2, which is similar to using bookmarks, consider using a block level content control. This avoids brittle point tags; it is nicer from an XML point of view, and offer advantages in the Word user interface (from an authoring point of view).

JasonPlutext
  • 15,352
  • 4
  • 44
  • 84
  • Thank-you, this helps, but what do you mean by a "block level content control"? – Doug Mar 16 '13 at 23:00
  • Ah, I think I answered my own question: http://www.opendope.org/opendope_conventions_v2.3.html – Doug Mar 16 '13 at 23:06
  • OpenDoPE does use content controls, but they're first and foremost first class WordML elements (where they are called a structured document tag, since their element name is 'sdt'). Block level ones can contain paragraphs and tables, and exist in the blocks list above (generally - they can also be in a table cell). Inline ones contain runs of text, not paragraphs or tables. – JasonPlutext Mar 17 '13 at 02:05