I need to perform random updates in-place to a file. Let's I need to update a file at offset k1, k2, ...., kn. From a performance perspective does it matter if I write in any order or does the performance improve if I write in increasing order of offsets? More specifically I am going to buffer of bunch of updates in RAM and then update. So when I ready to flush the updates to disk I know the offset I need to write to.
2 Answers
Moving the head of a hard drive from one cylinder to another is the largest performance killer when dealing with large files on rotating disks. The further the head has to move, the larger the impact.
Do the writes in order. That will cause (statistically speaking at least) all sectors on a cylinder to be written together (no head move), then the head to move to an adjacent cylinder (shortest possible head move).
Note that if you are dealing with a logical disk that has multiple backing physical disks (e.g. RAID, NAS) the head seek issue is somewhat mitigated by having more disks, but unless you have specific knowledge of the mapping from logical sector to physical storage, doing the updates in sector order is still most likely to minimize head moves.

- 147,927
- 63
- 340
- 553
Unless you know how your offsets translate into specific cylinders/heads/sectors, I wouldn't worry about it. Your disk controller may re-order requests for efficiency on its own, and if your disk gets fragmented you don't know whether the file's logical blocks are sequential or not anyway.

- 3,060
- 21
- 23
-
No... if you are batching large, random updates the disk controller will have zero knowlege that updates to an adjacent sector are yet to be sent down in the next batch. Percentage wise, there is a huge difference between writing in order and writing randomly. Whether the absolute numbers matter depends on the requirement, the hardware, and the size of the file. – Eric J. Aug 05 '11 at 16:43
-
@bosmacs: Even if you have file fragmentation, it's highly unlikely that the file will be organized randomly on disk. Even with high fragmentation (unlikely with modern OS's), two adjacent logical sectors are more likely to be adjacent physically than not. – Eric J. Aug 05 '11 at 16:47
-
@Eric J: Sure, it can't do them all, but it can reorder whatever requests are in its buffer. And if most of the writes are small, the controller should be able to do a decent job. Besides, in a modern system, you'll have interleaved requests from multiple programs, so there's no guarantee the head won't move in the middle of your updates anyway. – TMN Aug 05 '11 at 16:54
-
@TMN: You can't _guarantee_ that the head won't move, but if you're doing a large update, chances are pretty darn good that it won't move a statistically relevant amount of times. Just because you don't have perfect control over the problem doesn't mean you should not try to optimize the solution. – Eric J. Aug 05 '11 at 16:57
-
Eric J. is totally right. It is really a huge different when it comes to performance. Any sane file systems try to avoid breaking sequentially written data into random chunks to avoid the seeks. Disk I think the answer is harmful. – dmeister Dec 19 '11 at 15:24
-
Which is exactly my point: both the file system and the disk try to minimize seeks by [re-ordering writes](http://en.wikipedia.org/wiki/Elevator_algorithm), so you don't have to worry about it. In fact, you don't even have much say in the matter. And to Eric J's point; with a modern [journaling file system](http://en.wikipedia.org/wiki/Journaling_file_system) (e.g., NTFS, ext3/4) I can guarantee the head *will* move -- once to the journal to record the metadata, and once to the data area. So there is no 0-seek best case, unless maybe you're still using FAT32 on Windows ME. – TMN Dec 19 '11 at 16:39