It sounds like what you want to do is begin the file with a "header" which defines where the last result was found. This way, that information is written and stored in the file itself. An 8-digit hex value could be adequate for representing the offset in a file of size up to 4GB. Something like:
00000022<cr><lf>
Text...<cr><lf>
More text...<cr><lf>
~ <cr><lf> <-- this '~' is whatever we're looking for
Other stuff...<cr><lf>
I'm making some assumptions here. First, this is on Windows, where text lines are terminated in <cr>
and <lf>
characters (0x0D and 0x0A respectively.) If Unix, it will be <lf>
only. If Mac, it may be <cr>
only, or any of the others. I counted them in this example. And this is assuming ANSI-style strings, which means 8-bit encoding (one character = one byte of data.) The same functionality can be achieved with Unicode or other string formats, just note that they may no longer be exactly one byte per character. (In Unicode, it's two bytes per character. So expect trouble if mixing Unicode and ANSI string operations.)
Here, the "header" value is 0x22 or 34 decimal, and if you count all of the characters starting from the beginning of the file, the '~' is reached at the 34th count. So the "header" points to where the last search result was found.
How this works is like this: Initially this header value was zero, so your code would read this and know that it hasn't been searched yet. Lets say the code scanned through the file, incrementing by one for each character, until it found the '~' character. Then it seeks back to the beginning, converts this count value into 8 text characters (itoa
or sprintf
), and overwrites this part of the file with it. One found, done, or process the whole thing again to search for more. Now the next time this file is processed, your code reads this header value, and converts it from text into an uint
(atoi
), seeks the file to this offset plus one (since we don't want to catch this one again), then starts scanning again.
The others here have some good examples of actual code to start experimenting with. Note that if you're looking for more than just a character, such as a word or series of digits, the scanning portion becomes slower and more complex. Complex scanning of "tokens" instead of simple characters or words is called lexicographical analysis and that is a whole other topic. Google Flex and Bison
or YACC
, etc.