-1

I used fin to read in a .doc file, and then store all the text in a string. When I tried printing the string, I just saw unknown characters.

When I copied the contents of the .doc file into a .txt file and then read the .txt file in using fin, everything worked fine.

My question is whether fin works with complex files (such as .doc) or just with .txt files. I only had text in my .doc file (no graphics or anything), but the font was Calibri, which is not the font that fout uses to print text to a .doc file.

  • Why do you need to read a `.doc` file? What do you intend to do with it? What exact information do you need to extract from it? Please **edit your question** to improve it – Basile Starynkevitch Dec 29 '17 at 09:25

3 Answers3

1

If by fin you mean an fistream yes it will work to read the file contents, however in the case of complex files you have to deal with the file format, the c++ library will not automatically extract just the text contents. In the case where you saved the file as text that's all that is left and so that's all a stream would read.

SoronelHaetir
  • 14,104
  • 1
  • 12
  • 23
1

fstream by default does all operations in text mode and .doc files use MS-DOC binary file format. So probably when you tried to read the doc file and print it, it showed characters that you couldn't understand (probably that was binary).

If you try to read any file in fstream, it does read it.

I tried reading a .mp4 file in binary using fstream and it did read the file( i can assure that because i pasted the read contents in another file and that file turned out to be the same video).

So answer to your question is you can read any file in fstream but fstream does all this operations in only two ways, either text or binary. So reading just any file won't do much good unless you want to do something like copying the file contents to another.

1

You first need to understand the .doc file format. Read first the doc (computing) wikipage. It is very complex (so you'll need months of work at least) but more or less documented.

You could consider a different approach to your overall goal. For example, if you need to parse a .doc file (provided by some Microsoft Word software), you might use libreoffice which provides some library to parse it, or you could find another library (e.g. DocxFactory, wvware, ...), or you could use some COM interface to Word (on a Microsoft Windows operating system with MicroSoft Word installed).

If your goal is to generate some document, you might consider the PDF format (which is a standard), perhaps using some text formatter like LaTeX or Lout to generate it, or some library (e.g. cairo, PoDoFo, etc ...).

My question is whether fin works with complex files (such as .doc)

BTW, C++ standard IO is capable of reading binary files, but you need to write your parser for them (so you need to understand precisely your file format). You should prefer open formats to proprietary formats.

Basile Starynkevitch
  • 223,805
  • 18
  • 296
  • 547