0

I'm simply trying to output the data found in tables, however I have only managed to print out memory locations and other obj info. Here I'm using a tablefinder to locate all of my tables in a word doc then traversing through them. I'm just so unbelievably stuck how to print out the data contained in these tables. Below is an image of the Text.docx I am working with along with a snippet of the code. To be clear I'm not sure if I should accessing a table row (Tr) as this code snippet shows, or the parent Tbl object to print out the text contained within the table. In this case, I just want it to print "I", "Just", "Want"... etc.

enter image description here

    WordprocessingMLPackage wordMLPackage = WordprocessingMLPackage.load(new File("C:\\Users\\1120248\\Test\\Test.docx"));
    MainDocumentPart documentPart = wordMLPackage.getMainDocumentPart();

    TableFinder finder = new TableFinder();
    new TraversalUtil(documentPart.getContent(), finder);

    System.out.println("Found " + finder.tblList.size()  + "tables");

    for (Object o : finder.tblList) {

        Object o2 = XmlUtils.unwrap(o);     

        if (o2 instanceof org.docx4j.wml.Tbl) {

            Tbl tbl = (Tbl)o2;
            Tr t = (Tr)tbl.getContent().get(0);

            System.out.println(t.getContent());
            System.out.println(t.toString());
            System.out.println(XmlUtils.unwrap(t.getContent().get(0)));
        }
    }

This is the output produced by this setup:

[javax.xml.bind.JAXBElement@a146b11, javax.xml.bind.JAXBElement@f438904, javax.xml.bind.JAXBElement@4ed5a1b0, javax.xml.bind.JAXBElement@18d003cd, javax.xml.bind.JAXBElement@3135bf25, javax.xml.bind.JAXBElement@22ad1bae]

org.docx4j.wml.Tr@4116f66a

org.docx4j.wml.Tc@59c04bee

Brian Tompsett - 汤莱恩
  • 5,753
  • 72
  • 57
  • 129
SuperGoA
  • 225
  • 1
  • 2
  • 9
  • Are you tied to docx4j? – Rabbit Guy Jul 06 '17 at 18:28
  • @RabbitGuy Yes I'm tied – SuperGoA Jul 06 '17 at 18:34
  • 5
    Tr is table row. tr.getContent will give you a list of tc (table cells). Each tc has in turn, getContent, which in this case contains P (paragraph) objects. In those are R (run) objects, which finally contain text runs. That's what the OpenXML typically looks like. You can see it by XmlUtils.marshallToString(tbl). – JasonPlutext Jul 06 '17 at 19:01
  • @JasonPlutext Thank you so much! So I assume I should take an approach like [this](https://stackoverflow.com/questions/24755952/how-to-read-word-document-and-get-parts-of-it-with-all-styles-using-docx4j)? – SuperGoA Jul 06 '17 at 19:14
  • Similar, but the referenced method DocxUtility.getDocxUtility is not part of docx4j. You can use getContents instead. – JasonPlutext Jul 06 '17 at 22:30
  • @JasonPlutext please put your comment as answer so we can upvote. This answer saves lives. – tksilicon Jul 16 '22 at 09:26

2 Answers2

0

Work for me

    TableFinder finder = new TableFinder();
    finder.walkJAXBElements(documentPart.getContent());
0

For those who will be stuck on this question. For visibility, the comment by @JasonPlutext is the answer. Tr - Tc - P - R - Text. Table row, to Table cell to Paragraph and R and then add text.

tksilicon
  • 3,276
  • 3
  • 24
  • 36