0

Since files are stored as blocks on disk, is it possible to insert a block in the middle of the chain of blocks?

This is because, without such an API, if I wanted to insert a 4kb block in the middle of a file at a certain position, using the traditional read/write apis, I would essentially have to rewrite everything in the file after that position and shfit them by 4kb.

I'm ok with an answer that only works for some OS or some file systems. It doesn't have to be cross-platform or work for every file system.

(I also understand that not all file systems or hardware use 4kb for blocks - answers that work for different numbers are also ok).

hasen
  • 161,647
  • 65
  • 194
  • 231
  • Classically there is no way to do this. Recently there was a post here on SO about a newish Linux API that lets you do this, albeit only for certain underlying filesystem types. Unfortunately I can't remember the details. – Steve Summit Sep 19 '21 at 11:43
  • See [this answer](https://stackoverflow.com/questions/17203138/adding-content-to-middle-of-file-without-reading-it-till-the-end/33339013#33339013). (So maybe not so recent, and not so "newish"!) – Steve Summit Sep 19 '21 at 11:48

1 Answers1

2

I am not sure about filesystems that would allow extending a file in the middle easily. Then again many modern filesystems do not actually have a chain of blocks. The chain of blocks was a thing on the FAT filesystem family. Instead the blocks in modern filesystems are often organized as a tree. Within a tree you can find the block containing any byte position in O(lg n) reads, with the logarithm having such a large base that it can be considered essentially constant.

While the chain would allow for the operation of "insert n blocks in between" with comparative ease, the tree unfortunately does not. This doesn't mean that the tree is the wrong structure - on the contrary, many database systems benefit greatly from the fast random access that it offers.

Note that the tree enables you to have another thing that might be useful instead of holes - Unix file systems have sparse files - any blocks of the file that are known to contain zeroes need not actually use disk space at all - instead these blocks are marked unallocated and considered containing zeroes in the tree structure itself.

  • Interesting. So I can generally assume that reading 3 random blocks and reading 3 consecutive blocks would have about the same performance characterestics? – hasen Sep 19 '21 at 10:23
  • @hasen only on an SSD. HDD would still have to move its head if the blocks are scattered around. That of course ignores things like disk cache and whether data you care about is already in it or not. Maybe "it depends" would be more correct. – Kaihaku Sep 19 '21 at 10:43