2

I'm looking to modify certain tags (like comments, keywords, etc) of a .DOC file. I've been able to do this for DOCX using docx4j but I haven't been able to find anything that lets me change the tags for a .DOC format.

Is there a way to programmatically change the content of certain tags in a .DOC file?

Anthony
  • 33,838
  • 42
  • 169
  • 278

1 Answers1

3

Apache POI will quite happily let you read and edit the metadata of supported documents. For the older OLE2 formats (.doc, .xls etc), you'll want to use HPSF, likely via POIDocument. For the OOXML formats (.docx, .xlsx etc) use POIXMLDocument and POIXMLProperties

To modify the OLE2 properties, you can either follow the detailed instructions and code in the HPSF documentation, or on newer version of POI you can short cut quite a bit of that with HPSFPropertiesOnlyDocument, eg

NPOIFSFileSystem fs = new NPOIFSFileSystem(new File("test.doc"));
HPSFPropertiesOnlyDocument doc = new HPSFPropertiesOnlyDocument(fs);

SummaryInformation si = doc.getSummaryInformation();
if (si == null) doc.createInformationProperties();

si.setAuthor("StackOverflow");
si.setTitle("Properties Demo!");

FileOutputStream out = new FileOutputStream("changed.doc");
doc.write(out);
out.close();
Gagravarr
  • 47,320
  • 10
  • 111
  • 156
  • 1
    What poi version is `HPSFPropertiesOnlyDocument` in? I'm using 3.10beta1 from maven but I don't find it in there. – Anthony Aug 13 '13 at 18:49
  • 1
    Try a recent nightly build, or wait a couple more days for 3.10 beta 2 (Tim's working on the release candidate at the moment!) – Gagravarr Aug 13 '13 at 21:14