6

I need to programmatically count the characters and/or words and/or paragraphs which have been applied a specific known style in a DOCX document.

I need to know 1) if this is possible and to 2) any hints as to where I can start to get going to solve this problem.

I am familiar with DOM navigation, XPath/XQuery, and can use .Net, PHP or Java or any other tool as long as I can solve this problem.

Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
andrerav
  • 404
  • 5
  • 14
  • 1
    I think it can be done quite easily with some OLE automation (written in C# or VB.NET), I really think it's just a matter of reading the docs and going by trial and error (it's always like that with MS OLE automation). I don't suggest you to parse the DOCX document by yourself, even if it's XML, it's a very complicated document format and you can easily write some non-robust code – gd1 Apr 28 '11 at 20:23
  • Great suggestion, I will investigate that option! Please add your comment as an answer so I can credit you if this works :) – andrerav Apr 28 '11 at 20:25
  • 1
    Is this a one-off script or something that will go into a supported production application? If it's the latter I'd recommend at least looking into using the official OOXML SDK (which unfortunately I've only ever used for .xlsx and not .docx, so I can't say how easy this particular task might be, but working with .xlsx wasn't that bad after some initial head-scratching) instead, because Office dependencies can be a pain to manage. The SDK isn't quite as straightforward as OLE automation, as it's a relatively thin wrapper over the XML, but it's still better than working with the DOM directly. – Max Strini Apr 29 '11 at 01:36

1 Answers1

2
Microsoft.Office.Interop.Word.Application word = new Microsoft.Office.Interop.Word.Application();
        Microsoft.Office.Interop.Word.Document doc = new Microsoft.Office.Interop.Word.Document();

        try
        {
            object fileName = @"C:\TT\change.docx";
            doc = word.Documents.Open(ref fileName,
                ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing, ref missing,
                ref missing, ref missing, ref missing);

            doc.Activate();

            int count = doc.Characters.Count ;
            int words = doc.Words.Count; ;
            int paragraphs = doc.Paragraphs.Count;

            doc.Save();

            doc.Close(ref missing, ref missing, ref missing);
            word.Application.Quit(ref missing, ref missing, ref missing);
        }
        catch (Exception ex)
        {
            doc.Close(ref missing, ref missing, ref missing);
            word.Application.Quit(ref missing, ref missing, ref missing);
        }  
Todd Main
  • 28,951
  • 11
  • 82
  • 146
manish
  • 312
  • 3
  • 9