0

I am looking for an efficient method to access a large binary file at varying positions without reading and writing the whole file.

The file would consist of a very high number of lines, each containing fixed length data and data of varying length, e.g.:

<id><type><some_atributes><just_few_bytes>\n
<id><type><some_atributes><block_holding_KB_of_data>\n
<id><type><some_atributes><some_other_bytes>\n
...

My aim is to jump to a specific ID and overwrite the record with new data.

I was thinking of seek() and fwrite() but the question is how to seek() to the right line/position to read/write efficiently without checking each byte for the end of the line? Isn't there a similar problem in databases?

Sceptical Jule
  • 889
  • 1
  • 8
  • 26
  • How about intelligently split the files into chunks... Modify the relevant chunks, and assemble the whole file when done? – WhiZTiM May 22 '17 at 17:04
  • See this question: https://stackoverflow.com/q/43006281/1865694 . – Alex Zywicki May 22 '17 at 17:05
  • What about put the fixed-length stuff in a separate file, or at the beginning, and for each entry also store the offset in the file where that chunk lives. Then fixed-size indexing gives you random access. To reduce the hit caused by multiple reads per record, pick some small number of data bytes that will be stored directly in the index. But talking about "lines" and "KB of data" in the same spec seems odd. If it's binary, you don't have or want "\n" at all. – Peter May 22 '17 at 17:09
  • 1
    If you control the format, why not just use a database engine, as you allude to? `sqlite3` is self-contained, operates directly on a single file, is widely used, has liberal licensing, etc, etc, etc. Your varying length data would be a "blob". – Peter May 22 '17 at 17:13
  • 1
    _"...Isn't there a similar problem in databases?"_ internally databases use fixed sized records and/or separate index tables to solve this problem. You also have the problem of what to do when you need to change the length of line. SQLite would seem to be a natural fit to solve your problem: https://www.sqlite.org/ – Richard Critten May 22 '17 at 17:28

1 Answers1

1

My aim is to jump to a specific ID and overwrite the record with new data.

You need to traverse the whole file at least once to do that. As your data seems to be tagged with an id anyhow you can create a map<id,position_in_file> and then use that map for the direct jumps via seekg(position_in_file).

463035818_is_not_an_ai
  • 109,796
  • 11
  • 89
  • 185