1

I have implemented Apache POI library for Page count of Doc pages, but it shows page count zero when I download Google Doc as .docx file.

Edit: My code is as follows

public Integer getPagesCount(byte[] docBytes, String type)
        throws IOException {
    ByteArrayInputStream in = new ByteArrayInputStream(docBytes);
    String lowerFilePath = type.toLowerCase();
    if (lowerFilePath.equals("docx")) {
        @SuppressWarnings("resource")
        XWPFDocument docx = new XWPFDocument(in);
        return docx.getProperties().getExtendedProperties()
                .getUnderlyingProperties().getPages();
    } else if (lowerFilePath.equals("doc")) {
        @SuppressWarnings("resource")
        HWPFDocument wordDoc = new HWPFDocument(in);
        return wordDoc.getSummaryInformation().getPageCount();
    } else if (lowerFilePath.equals("ppt")) {
        HSLFSlideShow document = new HSLFSlideShow(in);
        return document.getSlides().size();
    } else if (lowerFilePath.equals("pptx")) {
        @SuppressWarnings("resource")
        XMLSlideShow xslideShow = new XMLSlideShow(in);
        return xslideShow.getSlides().size();
    } else if (lowerFilePath.equals("pdf")) {
        PDDocument doc = PDDocument.load(in);
        return doc.getNumberOfPages();
    }
    return 0;
}
Manish Kumar
  • 101
  • 11
  • 2
    How have you implemented "Page count of Doc pages"? There is no such functionality in [XWPFDocument](https://poi.apache.org/apidocs/dev/org/apache/poi/xwpf/usermodel/XWPFDocument.html) directly. So it is not obvious. Please show your code. And additional: `*.docx` - `XWPF` and `*.doc` - `HWPF` are two totally different file systems. They have absolutely nothing in common. So tagging a question about `*.docx` with `hwpf` is not correct. – Axel Richter Feb 08 '19 at 07:44
  • 2
    Kindly show your code as well please. – Timothy T. Feb 08 '19 at 07:44
  • If you download a `*.docx` from `Google Docs`, then there are not any `docProps`. So they cannot be read. A `*.docx` is simply a `ZIP` archive. You can unzip it and will find `/docProps/app.xml` in a file saved by `Word`. But you will not find a `/docProps` at all in a file downloaded from `Google Docs`. – Axel Richter Feb 10 '19 at 12:34
  • So is there a way out for Google docs file (downloaded as .docx) , because I need for a use case. – Manish Kumar Feb 11 '19 at 04:50
  • 2
    Since `Google Docs` not provides any `docProps` and the page count is part of those, no there is no way out using `apache poi`. To count pages one must rendering the content of the `*.docx` file the same as `Word` would rendering it. Only possible having any application installed which could do this. `Microsoft Word` itself or `Libreoffice`/ `Openoffice` `Writer`, ... and then interact with that application while it has rendered the `*.docx` file. – Axel Richter Feb 11 '19 at 06:52
  • @AxelRichter your answer its perfect, it's necessary open the doc file with microsoft office or libreoffice, save and then get the page count again. – Henrique Schmitt Jul 07 '20 at 19:39

0 Answers0