How can you read for a specific value from a binary file starting from the end of the file?

Question

I am trying to figure out how to actually (appropriately) read for the PDF trailer Byte_offset_of_last_cross-reference_section from a PDF file.

According to the PDF 1.7 (ISO 32000-1:2008) specification, the file structure is designed in a way that it should be read from the end of the file. Here is an example of what a simplified (minimal) trailer looks like when I use a StreamReader and read the file line-by-line (UTF8 Encoding):

trailer
<< key1 value1
     key2 value2
     …
     keyn valuen
>>
startxref
Byte_offset_of_last_cross-reference_section
%%EOF

trailer
<</Root 7 0 R /Size 7>>
startxref
696
%%EOF

The value I want to somehow grab is the 696 value. I'm just not sure how to do that using a BinaryReader starting from the end of the file.

Sebastian Piu · Accepted Answer · 2013-11-21T20:53:26.730

2

You can use the Seek method, see here for examples. You can use SeekOrigin.End as argument, see here for other options

example:

using (var reader = File.Open(...))
{
    reader.Seek(100, SeekOrigin.End);
    //...
}

You can start reading backwards in a loop till you get to the startxref marker (or anything that helps you know that you can read 696) or assume a length of 100 bytes from the end of the file and then do a lookup in that small array as Anthony suggested in the comment below.

edited Nov 21 '13 at 20:53

answered Nov 21 '13 at 20:07

Sebastian Piu

7,838
1
32
50

Link-only answers are discouraged. Please reflect the core of what you are trying to show in your answer by providing a code snippet or a more elaborate explanation. – Jeroen Vannevel Nov 21 '13 at 20:09
1

It might be better to start from `reader.Length - 50` and continue to seek forward until you find what you need. Not sure how good it would be to actually seek 1 byte at a time backwards in a file. – Anthony Nov 21 '13 at 20:48
Actually if you want to emulate the laxity of e.g. Adobe Reader, you would start from `reader.Length - 1000` and tolerate some trash bytes after the EOF marker. Cf. The implementation notes. – mkl Nov 21 '13 at 21:50
@mkl: Where did you see that Adobe Reader starts 1000 bytes back? Also, why 1000 instead of 1024, considering that's the normal buffer size. – myermian Nov 21 '13 at 22:03
You are right, it's not exactly 1000. But I talked about *emulating the laxity of e.g. Adobe Reader* and by that didn't mean operating exactly like that one product but allowing a certain fairly common degree of laxity. – mkl Nov 21 '13 at 22:34

score 0 · Answer 2 · answered Nov 21 '13 at 20:10

0

How about using something like:

List<string> allLines = File.ReadAllLines(filePathHere);
return allLines[allLines.Count - 2];

answered Nov 21 '13 at 20:10

Sourav 'Abhi' Mitra

2,390
16
15

1

Per the specifications, it isn't recommended to read the file line by line forward. It is recommended (as the question states) to read the file from end to start. – myermian Nov 21 '13 at 20:21
PDFs can be pretty big. Reading all lines like this merely to retrieve one number is a huge waste of resources. – mkl Nov 21 '13 at 20:46

How can you read for a specific value from a binary file starting from the end of the file?

2 Answers2