0

In previous code we would use XWPFSParagraph and Runs to find Merge Tokens that we would replace with data in our database. However, now we need to use Content Controls and do the same thing. The problem with Paragraph and Runs is that Content controls do not appear as a single run, like a Merge Token would. And we need to get the content control title to act like a Merge Token name would be so we know where to find the data in the database, and then replace it in the document. In Content Controls we wouldn't be replacing the Content Control with the data from the db, but we would have to set the Text value in the <w:sdtContentControl> with that data.

I thought about converting the <x:sdt> object into the xml text, but then it is now removed from the Poi object because it is now a string on its own.

So I was thinking of finding the <x:sdt> and fully replacing it with a new one where each part of it would be the same except the part in the <w:sdtContentControl> section. Is this possible? Any recommendations on how to use Poi to "modify" the sdt we get from the Word doc?

bytor99999
  • 726
  • 7
  • 26

2 Answers2

3

XWPFSDT as well as XWPFSDTCell are in experimental state up to now. They don't even have access to their underlying CTSdtBlock, CTSdtRun and CTSdtCell classes. I don't know why. So extending them to provide writing into the CTSdtBlock, CTSdtRun or CTSdtCell is not possible. If that is the need, then a new class is needed which can be created from any kind of Word SDT content control object. This class SDTContentControl could look like so:

import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSdtBlock;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSdtContentBlock;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSdtRun;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSdtContentRun;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSdtCell;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTSdtContentCell;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTTc;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTP;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTR;
import org.openxmlformats.schemas.wordprocessingml.x2006.main.CTText;

import org.apache.xmlbeans.XmlCursor;
import org.apache.xmlbeans.XmlObject;
import org.apache.xmlbeans.impl.values.XmlObjectBase;
import javax.xml.namespace.QName;

import java.util.Calendar;
import java.text.SimpleDateFormat;
import java.math.BigDecimal;
import java.text.DecimalFormat;

public class SDTContentControl {
 private XmlObject object = null;
 
 public SDTContentControl(XmlObject object) {
  this.object = object;   
 }
 
 public String getTitle() {
  if (this.object instanceof CTSdtBlock) {
   CTSdtBlock ctSdtBlock = (CTSdtBlock)this.object;
   if (ctSdtBlock.isSetSdtPr()) {
    if (ctSdtBlock.getSdtPr().isSetAlias()) {
     return ctSdtBlock.getSdtPr().getAlias().getVal();   
    }
   }
  } else if (this.object instanceof CTSdtRun) {
   CTSdtRun ctSdtRun = (CTSdtRun)this.object;
   if (ctSdtRun.isSetSdtPr()) {
    if (ctSdtRun.getSdtPr().isSetAlias()) {
     return ctSdtRun.getSdtPr().getAlias().getVal();   
    }
   }
  } else if (this.object instanceof CTSdtCell) {
   CTSdtCell ctSdtCell = (CTSdtCell)this.object;
   if (ctSdtCell.isSetSdtPr()) {
    if (ctSdtCell.getSdtPr().isSetAlias()) {
     return ctSdtCell.getSdtPr().getAlias().getVal();   
    }
   }
  }      
  return null;  
 }
 
 public String getTag() {
  if (this.object instanceof CTSdtBlock) {
   CTSdtBlock ctSdtBlock = (CTSdtBlock)this.object;
   if (ctSdtBlock.isSetSdtPr()) {
    if (ctSdtBlock.getSdtPr().isSetTag()) {
     return ctSdtBlock.getSdtPr().getTag().getVal();   
    }
   }
  } else if (this.object instanceof CTSdtRun) {
   CTSdtRun ctSdtRun = (CTSdtRun)this.object;
   if (ctSdtRun.isSetSdtPr()) {
    if (ctSdtRun.getSdtPr().isSetTag()) {
     return ctSdtRun.getSdtPr().getTag().getVal();   
    }
   }
  } else if (this.object instanceof CTSdtCell) {
   CTSdtCell ctSdtCell = (CTSdtCell)this.object;
   if (ctSdtCell.isSetSdtPr()) {
    if (ctSdtCell.getSdtPr().isSetTag()) {
     return ctSdtCell.getSdtPr().getTag().getVal();   
    }
   }
  }        
  return null;
 }
 
 public String getContentText() {
  XmlObject[] sdtContents = this.object.selectPath(
    "declare namespace w='http://schemas.openxmlformats.org/wordprocessingml/2006/main' "
   +".//w:sdtContent");
  for (XmlObject sdtContent : sdtContents) {
   if (sdtContent instanceof XmlObjectBase) {
    return ((XmlObjectBase)sdtContent).getStringValue();  
   }    
  }
  return null;
 }
 
 public void setContent(String text) {
  if (this.object instanceof CTSdtBlock) {
   CTSdtBlock ctSdtBlock = (CTSdtBlock)this.object;
   if (ctSdtBlock.isSetSdtContent()) {
    CTSdtContentBlock sdtContentBlock = ctSdtBlock.getSdtContent();
    CTP ctP = sdtContentBlock.getPArray(0); if (ctP == null) ctP = CTP.Factory.newInstance();
    for (int r = ctP.getRList().size()-1; r >= 0 ; r--) ctP.removeR(r);
    CTR ctR = ctP.addNewR();
    if (ctSdtBlock.isSetSdtPr()) {
     if (ctSdtBlock.getSdtPr().isSetRPr()) {
      ctR.setRPr(ctSdtBlock.getSdtPr().getRPr());
     }   
    }
    CTText ctText = ctR.addNewT();
    ctText.setStringValue(text);
    sdtContentBlock.setPArray(new CTP[]{ctP});
   }
  } else if (this.object instanceof CTSdtRun) {
   CTSdtRun ctSdtRun = (CTSdtRun)this.object;
   if (ctSdtRun.isSetSdtContent()) {
    CTSdtContentRun sdtContentRun = ctSdtRun.getSdtContent();
    CTR ctR = CTR.Factory.newInstance();
    if (ctSdtRun.isSetSdtPr()) {
     if (ctSdtRun.getSdtPr().isSetRPr()) {
      ctR.setRPr(ctSdtRun.getSdtPr().getRPr());
     }   
    }
    CTText ctText = ctR.addNewT();
    ctText.setStringValue(text);
    sdtContentRun.setRArray(new CTR[]{ctR});
   }
  } else if (this.object instanceof CTSdtCell) {
   CTSdtCell ctSdtCell = (CTSdtCell)this.object;
   if (ctSdtCell.isSetSdtContent()) {
    CTSdtContentCell sdtContentCell = ctSdtCell.getSdtContent();
    for (int c = 0; c < sdtContentCell.getTcList().size(); c++) {  
     CTTc ctTc = sdtContentCell.getTcList().get(c);
     CTP ctP = ctTc.getPArray(0); if (ctP == null) ctP = CTP.Factory.newInstance();
     for (int r = ctP.getRList().size()-1; r >= 0 ; r--) ctP.removeR(r);
     CTR ctR = ctP.addNewR();
     if (ctSdtCell.isSetSdtPr()) {
      if (ctSdtCell.getSdtPr().isSetRPr()) {
       ctR.setRPr(ctSdtCell.getSdtPr().getRPr());
      }   
     }
     CTText ctText = ctR.addNewT();
     ctText.setStringValue(text);
     ctTc.setPArray(new CTP[]{ctP});
    }
   }
  }
 }

 public void setContent(Calendar calendar) {
  String dateFormat = "yyyy-MM-dd";
  if (this.object instanceof CTSdtBlock) {
   CTSdtBlock ctSdtBlock = (CTSdtBlock)this.object;
   if (ctSdtBlock.isSetSdtPr()) {
    if (ctSdtBlock.getSdtPr().isSetDate()) {
     if (ctSdtBlock.getSdtPr().getDate().isSetDateFormat()) {
      dateFormat = ctSdtBlock.getSdtPr().getDate().getDateFormat().getVal();  
     }
     ctSdtBlock.getSdtPr().getDate().setFullDate(calendar);
    }
   }
  } else if (this.object instanceof CTSdtRun) {
   CTSdtRun ctSdtRun = (CTSdtRun)this.object;
   if (ctSdtRun.isSetSdtPr()) {
    if (ctSdtRun.getSdtPr().isSetDate()) {
     if (ctSdtRun.getSdtPr().getDate().isSetDateFormat()) {
      dateFormat = ctSdtRun.getSdtPr().getDate().getDateFormat().getVal();  
     }
     ctSdtRun.getSdtPr().getDate().setFullDate(calendar);
    }
   }
  } else if (this.object instanceof CTSdtCell) {
   CTSdtCell ctSdtCell = (CTSdtCell)this.object;
   if (ctSdtCell.isSetSdtPr()) {
    if (ctSdtCell.getSdtPr().isSetDate()) {
     if (ctSdtCell.getSdtPr().getDate().isSetDateFormat()) {
      dateFormat = ctSdtCell.getSdtPr().getDate().getDateFormat().getVal();  
     }
     ctSdtCell.getSdtPr().getDate().setFullDate(calendar);
    }
   }
  }
  SimpleDateFormat simpledDateFormat = new SimpleDateFormat(dateFormat);
  String text = simpledDateFormat.format(calendar.getTime());
  this.setContent(text); 
 }
 
 public void setContent(BigDecimal value) {
  DecimalFormat decimalFormat = new DecimalFormat("#,##0.00");
  String text = decimalFormat.format(value.doubleValue());
  this.setContent(text);   
 } 
 
 public void setContent(Object content) {
  if (content instanceof String) {
   this.setContent((String)content);  
  } else if (content instanceof Calendar) {
   this.setContent((Calendar)content);       
  } else if (content instanceof BigDecimal) {
   this.setContent((BigDecimal)content);       
//} else if (content instanceof ...) {
   //ToDo
  } else {
   this.setContent(String.valueOf(content));         
  }
 }
  
}

A list of all SDT content control objects can be created from a XWPFDocument like so:

...
 /*modifiers*/ List<SDTContentControl> extractSDTsFromBody(XWPFDocument document) {
  SDTContentControl sdt;
  XmlCursor xmlcursor = document.getDocument().getBody().newCursor();
  QName qnameSdt = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "sdt", "w");
  List<SDTContentControl> allsdts = new ArrayList<SDTContentControl>();
  while (xmlcursor.hasNextToken()) {
   XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
   if (tokentype.isStart()) {
    if (qnameSdt.equals(xmlcursor.getName())) {
     if (xmlcursor.getObject() instanceof XmlObject) {
      sdt = new SDTContentControl((XmlObject)xmlcursor.getObject()); 
      allsdts.add(sdt);
     }
    } 
   }
  }
  return allsdts;
 }
...

The SDTContentControl provides methods to get the content control title, the content control tag and the content control text content. It also provides a method to set the content control content from any kind of Object.

Complete example:

import java.io.FileInputStream;
import java.io.FileOutputStream;

import org.apache.poi.xwpf.usermodel.*;

import java.util.List;
import java.util.ArrayList;

import org.apache.xmlbeans.XmlCursor;
import org.apache.xmlbeans.XmlObject;
import javax.xml.namespace.QName;

import java.util.GregorianCalendar;
import java.math.BigDecimal;

public class WordFillContentControls {

 private static List<SDTContentControl> extractSDTsFromBody(XWPFDocument document) {
  SDTContentControl sdt;
  XmlCursor xmlcursor = document.getDocument().getBody().newCursor();
  QName qnameSdt = new QName("http://schemas.openxmlformats.org/wordprocessingml/2006/main", "sdt", "w");
  List<SDTContentControl> allsdts = new ArrayList<SDTContentControl>();
  while (xmlcursor.hasNextToken()) {
   XmlCursor.TokenType tokentype = xmlcursor.toNextToken();
   if (tokentype.isStart()) {
    if (qnameSdt.equals(xmlcursor.getName())) {
     if (xmlcursor.getObject() instanceof XmlObject) {
      sdt = new SDTContentControl((XmlObject)xmlcursor.getObject()); 
      allsdts.add(sdt);
     }
    } 
   }
  }
  return allsdts;
 }

 public static void main(String[] args) throws Exception {
     
  String[] contentControlTags = new String[]{
   "NameTag", "GenderTag", "DateTag", "AmountTag", 
   "DescriptionTag", "Col1Tag", "Col2Tag", 
   "Col1DateTag", "Col2ChooseTag"
  };
  Object[] contents = new Object[]{
  "Axel Richter", "male", new GregorianCalendar(2022, 0, 1), BigDecimal.valueOf(1234.56), 
  "Lorem ipsum semit dolor ... dolor semit ...", "Blah blah", "Blubb blubb", 
   new GregorianCalendar(1964, 11, 21), "My choice"
  };

  XWPFDocument document = new XWPFDocument(new FileInputStream("./WordFormContentControl.docx"));
  
  List<SDTContentControl> allsdts = extractSDTsFromBody(document);

  for (SDTContentControl sdt : allsdts) {
//System.out.println(sdt);
   String title = sdt.getTitle();
   String tag = sdt.getTag();
   String content = sdt.getContentText();
   System.out.println(title + ": " + tag + ": " + content);
   
   for (int i = 0; i < contentControlTags.length; i++) {
    String tagToReplace = contentControlTags[i];
    if (tagToReplace.equals(tag)) {
     Object contentO = contents[i];
     sdt.setContent(contentO);
    }
   }
   
  }

  allsdts = extractSDTsFromBody(document);

  for (SDTContentControl sdt : allsdts) {
   String title = sdt.getTitle();
   String tag = sdt.getTag();
   String content = sdt.getContentText();
   System.out.println(title + ": " + tag + ": " + content);
  }

  FileOutputStream out = new FileOutputStream("./WordFormContentControlResult.docx");
  document.write(out);
  out.close();
  document.close();
 }
}
Axel Richter
  • 56,077
  • 6
  • 60
  • 87
  • That is awesome. Was real easy to plug into our code. We had some of your other code from the another SO post just about getting the content controls, so plugging this in was really just creating/copying this new class and using this extractSDTsFromBody(document) method instead of the one from that other post. We really appreciate your help with this. – bytor99999 Jul 18 '22 at 14:59
  • One more question, using the code, for a plain text content control it worked perfectly, for the Rich Text content control I got a class not found exception. It is for XMLInputStream which looks to have been deprecated in Apache XMLBeans back in 3.x versions and we are using 5.1. So this line CTR ctR = CTR.Factory.newInstance() So a Rich Text SDT has more runs in it? Where as the others we just Paragraphs? Just trying to work through how I might fix the exception, if there is an alternative. – bytor99999 Jul 18 '22 at 15:30
  • 1
    @bytor99999: My code is tested using current `apache po 5.2.2` and using rich text content controls too. If you have problems, please ask a new question and provide a [Minimal, Reproducible Example](https://stackoverflow.com/help/minimal-reproducible-example) so that we are able to reproduce your issue. The method `setContent(String text)` of my `SDTContentControl` only sets **one** paragraph per content control. As all the code here on StackOverflow, it is meant to show the principle. It is not a ready to use full developed library. How should it? – Axel Richter Jul 18 '22 at 15:51
  • it has no code changes to what you have above. and we also use Apache Poi 5.2.2 5.2.2 1.0.6 The only difference is the Word doc file itself. Anyway, thanks for you help above. I will continue to try to figure this out on why it is trying to use XMLInputStream here. – bytor99999 Jul 18 '22 at 16:28
  • 1
    @bytor99999: What is 1.0.6? Apache POI **contains** XWPF. There is no different separate version of this. Don't mix different Apache POI versions! See [faq-N10204](https://poi.apache.org/help/faq.html#faq-N10204) – Axel Richter Jul 18 '22 at 16:34
  • It was for converting to HTML and PDF fr.opensagres.xdocreport org.apache.poi.xwpf.converter.xhtml ${xwpf.version} fr.opensagres.xdocreport org.apache.poi.xwpf.converter.pdf ${xwpf.version} I can remove that and re-test. – bytor99999 Jul 18 '22 at 16:53
  • Removing it and everything worked. Thanks. Right now I don't need to create PDFs from the docs, so I can keep that out. Again, I really appreciate all the help. I had spent weeks trying to figure out how we were going to do server side merges without the old style MergeFields, because we really want to use Content Controls, and this will work perfectly. – bytor99999 Jul 18 '22 at 16:59
  • 1
    @bytor99999: That `org.apache.poi.xwpf.converter.pdf`does not fit. The old versions of XDocReport pull wrong versions of ooxml-schemas. See https://stackoverflow.com/questions/51440312/docx-to-pdf-converter-in-java/51440649#51440649 for what version of `fr.opensagres.poi.xwpf.converter.pdf` fits to what Apache POI version. – Axel Richter Jul 18 '22 at 17:00
0

For my use case, I only want to replace the body of the SDT element with new text. You can use the java or kotlin relection api to make the bodyElements property readable and replace the content inside the way you would the rest of the document.

gmh33
  • 1