0

I am able to get the content of the entire word document using method getText() and content of paragraphs using getParagraphs() . But i am looking to extract content based on headings.

Here is a sample file.

Title This is sample title – Paragraph 1. This is sample title – Paragraph 2. This is sample title – Paragraph 2. This is sample title – Paragraph 2. This is sample title – Paragraph 2.

Background: This is sample title – Paragraph 2

Experience: This is sample title – Paragraph 3

Where Title , Background and Experience are the headings .

Is it possible to get content of just title using apache poi or any other API?

Here is the code:

FileInputStream fis = new FileInputStream("Test1.docx");
XWPFDocument xdoc = new XWPFDocument(OPCPackage.open(fis));
XWPFStyles styles = xdoc.getStyles();
List<XWPFParagraph> paragraphs = xdoc.getParagraphs();
        for (int i = 0; i < paragraphs.size(); i++) {

    if (paragraphs.get(i).getStyleID() != null) {
        String styleid = paragraphs.get(i).getStyleID();

        System.out.println("Paragraph Heading: " +paragraphs.get(i).getText());

        XWPFStyle style = styles.getStyle(styleid);

        if (style != null) {
            if (style.getName().startsWith("heading")) {
            System.out.println("Heading-Style is :" +style.getName());  
            }
        }
    }
}

What i am looking is to get the body of the paragraph heading "Title" .

dps
  • 135
  • 1
  • 3
  • 10

0 Answers0