1

sorry for my English

I have the contents of a word document in a byte array and I want to know how many pages it has.

I already did this with a pdf file using this code:

public void MssGetNumberOfPages(byte[] ssFileBinaryData, out int ssNumberOfPages) {

        int pageCount;
        MemoryStream stream = new MemoryStream(ssFileBinaryData);
        using (var r = new StreamReader(stream))
        {
            string pdfText = r.ReadToEnd();
            System.Text.RegularExpressions.Regex regx = new Regex(@"/Type\s*/Page[^s]");
            System.Text.RegularExpressions.MatchCollection matches = regx.Matches(pdfText);
            pageCount = matches.Count;
            ssNumberOfPages = pageCount;
        }


        // TODO: Write implementation for action
    }

How do I do something similar, with a word document?

In the pdf I simply have to search through the regex the text that matches this:

Regex(@"/Type\s*/Page[^s]")

What do I have to put in the regex to match the pages of the word document?

2 Answers2

1

Well, I solved this myself by converting the word document into pdf with Aspose.dll

public void MssGet_Word_NumberOfPages(byte[] ssFileBinaryData, out int ssNumberOfPages) {

        // Load Word Document from this byte array
        Document loadedFromBytes = new Document(new MemoryStream(ssFileBinaryData));

        // Save Word to PDF byte array
        MemoryStream pdfStream = new MemoryStream();
        loadedFromBytes.Save(pdfStream, SaveFormat.Pdf);
        byte[] pdfBytes = pdfStream.ToArray();

        int pageCount;
        MemoryStream stream = new MemoryStream(pdfBytes);
        using (var r = new StreamReader(stream))
        {
            string pdfText = r.ReadToEnd();
            System.Text.RegularExpressions.Regex regx = new Regex(@"/Type\s*/Page[^s]");
            System.Text.RegularExpressions.MatchCollection matches = regx.Matches(pdfText);
            pageCount = matches.Count;
            ssNumberOfPages = pageCount;
        }
    }
0

Can you perhaps elaborate on the tool(s) you used to convert the word doc to PDF?

Hanno
  • 467
  • 11
  • 21