0

everyone.

I have to make a application in C++ that is capable of reading plain text from an existing .doc or .docx file (MS Word) in Windows, then create a new .doc or .docx file and write that plain text onto it.

Actually, I have to encrypt the text before I write it onto the new file, but right now the part of the application that I'm concerned about is the MS Word-file handling.

Do you have any tips for doing this? Any API or something that I can use?

Thanks.

omar
  • 61
  • 2
  • 4
  • Do you have MS Word available in the environment where the program will run? If yes, use the COM/ActiveX API of Word. – Doc Brown May 05 '13 at 19:45
  • Thanks, Doc Brown. Do you happen to know any source for coding examples? – omar May 06 '13 at 04:36
  • You did not answer my question. But assumed you can utilize MS Word for your task, I suggest you first start learning how to use COM/OLE/ActiveX from C++, for example, here: http://www.tenouk.com/cplusmfcdotnet.html Beware, that is not beginners stuff. A much easier approach would be to use for example, a .NET language like C# or C++/CLI for the Word stuff and let that interact with your C++ program. – Doc Brown May 06 '13 at 07:44

1 Answers1

0

The .doc and (to a lesser extent) .docx formats are just a bit cleaned up memory dumps of Word, so their format isn't really documented. Your best bet would be to use LibreOffice or OpenOffice (or perhaps Word) macros to do the job. LibreOffice/OpenOffice have nicely documented interfaces.

vonbrand
  • 11,412
  • 8
  • 32
  • 52
  • Actually, the file formats are extremely well documented. the .doc file format is a generic container format (CFBF) used by many, many things, and exhaustively documented by Microsoft. Basically, it's a collection of "streams" and "containers". Containers hold streams and other containers. What may not be documented is how to interpret the MSWord document data stored in the document stream, but the overall file format is documented. – This isn't my real name May 05 '13 at 19:08