0

Preferably I would like to do this in the browser with javascript. I am already able to unzip the doc file and read the xml files but can't seem to find a way to get a page count. I am hoping the property exist in the xml files I just need to find it.

edit: I wouldn't say it is a duplicate of Is there a way to count doc, docx, pdf pages with only js (without Node.js)? My question is specific to word doc/docx files and that question was never resolved.

Cindy Meister
  • 25,071
  • 21
  • 34
  • 43
Omar Rodriguez
  • 429
  • 5
  • 13
  • So two links that may be of interest: [this SO question](https://stackoverflow.com/questions/19830073/display-number-of-pages-in-word-ml) and [this wikipedia article](https://en.wikipedia.org/wiki/Microsoft_Office_XML_formats#Word_XML_Format_example) -- note that in the example page numbers is at ` `. I'm not sure if these are helpful or not, just found them and hoped they might be relevant to you. – Alexander Nied Oct 24 '18 at 03:28
  • Possible duplicate of [Is there a way to count doc, docx, pdf pages with only js (without Node.js)?](https://stackoverflow.com/questions/34762798/is-there-a-way-to-count-doc-docx-pdf-pages-with-only-js-without-node-js) – Circle Hsiao Oct 24 '18 at 03:31
  • That question was never resolved and I am specifically asking for word doc files. – Omar Rodriguez Oct 24 '18 at 03:43
  • Possible duplicate of [How to get MS Word total pages count using Open XML SDK?](https://stackoverflow.com/questions/53493433/how-to-get-ms-word-total-pages-count-using-open-xml-sdk) – Cindy Meister Nov 27 '18 at 17:23

3 Answers3

1

Found a way to do this with docx4js

Here is a small sample parsing file from input elem

import docx4js from 'docx4js';

docx4js.load(file).then(doc => {
  const propsAppRaw = doc.parts['docProps/app.xml']._data.getContent();
  const propsApp = new TextDecoder('utf-8').decode(propsAppRaw);
  const match = propsApp.match(/<Pages>(\d+)<\/Pages>/);
  if (match && match[1]) {
    const count = Number(match[1]);
    console.log(count);
  }
});
0

In theory, the following property can return that information from the Word Open XML file, using the Open XML SDK:

int pageCount = (int) document.ExtendedFilePropertiesPart.Properties.Pages.Text;

In practice, however, this isn't reliable. It might work, but then again, it might not - it all depends on 1) What Word managed to save in the file before it was closed and 2) what kind of editing may have been done on the closed file.

The only sure way to get a page number or a page count is to open a document in the Word application interface. Page count and number of pages is calculated dynamically, during editing, by Word. When a document is closed, this information is static and not necessarily what it will be when the document is open or printed.

See also https://github.com/OfficeDev/Open-XML-SDK/issues/22 for confirmation.

Cindy Meister
  • 25,071
  • 21
  • 34
  • 43
-2

When you say "do this in the browser" I assume that you have a running webserver with LAMP or the equivalent. In PHP, there is a pretty useful option for .docx files. An example php function would be:

function number_pages_docx($filename)
{
$docx = new docxArchive();

if($docx->open($filename) === true)
{  
    if(($index = $docx->locateName('docProps/app.xml')) !== false)
    {
        $data = $docx->getFromIndex($index);
        $docx->close();

        $xml = new SimpleXMLElement($data);
        return $xml->Pages;
    }

    $docx->close();
}

return false;
}
steven
  • 508
  • 1
  • 8
  • 23
  • No, I am wanting to do this directly in the browser via javascript to avoid making a trip to a backend server. I will have a backend api as a last resort that can get page count but I am really wanting to do it client side. Also in your code snippet 'docProps/app.xml' is not a file available in one of my test word documents. – Omar Rodriguez Oct 24 '18 at 03:34