4

I'm using the following code to read the content of a PDF file:

string document;
FILE * f;
f = fopen ( path , "rb");
unsigned char buffer[1024];
while(!feof(f)){   
    int bytes = fread(buffer,1,1024,f);
    for(int i = 0; i < bytes; i++){
        document += buffer[i];
        cout << buffer[i];
    }
}
fclose ( f );

The problem is, that the chars are not the same as when I open the file in a text editor. For example this file files.flashfan.ch/file.png

results in this output: files.flashfan.ch/output.png

How can I read the file, so that the chars are exactly the same as in the editor? I want to parse PDF files, but without the original chars I cant to this. I've testet the code with this file (its not a PDF file, just a part of one, so you can't display it):

PDF Head.pdf

Thanks for your help!

Van Coding
  • 24,244
  • 24
  • 88
  • 132
  • @user461872: parsing PDF is one thing, and reading PDF is another. The latter is of no use, in my opinion. So tell me what do you want to do just by reading? – Nawaz Dec 15 '10 at 11:54
  • I want to get a list of the PDF objects in the document. Then read some objects that fit my specs. But I know how to do this, I just thought the program reads the wrong chars from the file. So it would be impossible to parse the file. – Van Coding Dec 15 '10 at 12:09

3 Answers3

3

I don't see any errors in the way you read the file (the code actually works on my Linux box when I redirect the output to a file). Probably the issue is in the control characters that mess up with the console. Try to output to a file and compare with the input.

vitaut
  • 49,672
  • 25
  • 199
  • 336
  • You were right! Some signs in the file did manipulate the output in the console. The read text is correct, but it was displayed wrong. – Van Coding Dec 15 '10 at 11:02
1

It is a binary file, it makes no sense to open it in a text editor. Use a hex editor instead (like XVI32)

...and do the printing like this:

fprintf("%#x ", buffer[i]);
Johan Kotlinski
  • 25,185
  • 9
  • 78
  • 101
0

Try using a hex editor. Sometimes programs like notepad can't read normal code, so you would have to view it with a hex editor. I personally recommend ghex.

Pizearke
  • 107
  • 7