2

I have a problem reading whitespaces from a docx file using Apache POI 3.15. I have a Word document with line breaks in it, when reading the file via apache poi I cannot find a way to get those linebreaks. When I Call paragraph.getParagraphText() the Text is returned with the line breaks. When I iterate over the XWPFRun objects I only get the text and formatting, but no information about line breaks.

This is the code I use. The br,tab,cr and separator lists are always empty.

        XWPFDocument document = new XWPFDocument(fis);
    List<XWPFParagraph> paragraphs = document.getParagraphs();

    for(XWPFParagraph paragraph : paragraphs) {
        //System.out.println(paragraph.getParagraphText());
        for(XWPFRun run : paragraph.getRuns()) {
            CTR ctr = run.getCTR();
            List<CTBr> brList = ctr.getBrList();
            List<CTEmpty> tabList = ctr.getTabList();
            List<CTEmpty> crList = ctr.getCrList();             
            List<CTEmpty> separatorList = ctr.getSeparatorList();
            String text = run.getText(run.getTextPosition());
            String color =run.getColor();
            boolean bold = run.isBold();
            boolean italic = run.isItalic();
            System.out.println("text: " + text + " color: " + color + " bold: " + bold + " italic: " + italic); 

            for(CTEmpty cr : crList) {
                System.out.println(cr);
            }
        }           
    }

Is using the CTR Object to correct way to go or is there another way to get those linebreaks?

Word Example

Martin
  • 53
  • 1
  • 9
  • Could it be that the line breaks are not encoded in the CT classes, but are new line characters embedded in the runs? Could you attach a sample document that exhibits the issue? – jmarkmurphy Mar 20 '17 at 11:03
  • Please provide a sample paragraph where the issue occurs. – techprat Mar 20 '17 at 11:11
  • Great question. It’s also not clear in Apache poi how to iterate the elements inside a run according to their natural order. For example: text, br, text – Nathan B Oct 14 '20 at 22:20

1 Answers1

2

I found a solution to get the line breaks. Normal enters are returned as own paragraphs without text with a spacingAfter value. Soft enter within a paragraph are returned as breaks via run.getCTR().getBrList

Martin
  • 53
  • 1
  • 9
  • 1
    But how can we know their position relative to the text element in a run? For example if we have text,br,text, how can we get the list of all elements in a run? – Nathan B Oct 15 '20 at 01:10