First of all, I hope you are using raw disks and not a filesystem. If you're using a filesystem, you must:
Create an empty, non-sparse file that's as large as the filesystem will fit.
Obtain a mapping from the logical file positions to disk blocks.
Reverse this mapping, so that you can map from disk blocks to logical file positions. Of course some blocks are unavailable due to filesystem's own use.
At this point, the disk looks like a raw disk that you access by disk block. It's a valid assumption that this block addressing is mostly monotonous to the physical cylinder number. IOW if you increase the disk block number, the cylinder number will never decrease (or never increase -- depending on the drive's LBA to physical mapping order).
Also, note that a disk's average write speed may be given per cylinder or per unit of storage. How would you know? You need the latter number, and the only sure way to get it is to benchmark it yourself. You need to fill the entire disk with data, by repeatedly writing a zero page to the disk, going block by block, and divide the total amount of data written by the amount it took. You need to be accessing the disk or the file in the direct mode. This should disable the OS buffering for the file data, and not for the filesystem metadata (if not using a raw disk).
At this point, all you need to do is to write data blocks of sensible sizes at the two extremes of the block numbers: you need to fill the disk from both ends inwards. The size of the data blocks depends on the bandwidth wastage you can allow for seeks. You should also assume that the hard drive might seek once in a while to update its housekeeping data. Assuming a worst-case seek taking 15ms, you waste 1.5% of per-second bandwidth for each seek. Assuming you can spare no more than 5% of bandwidth, with 1 seek/s on average for the drive itself, you can seek twice per second. Thus your block size needs to be your_bandwith_per_second/2
. This bandwidth is not the disk bandwidth, but the bandwidth of your data source.
Alas, if only things where this easy. It generally turns out that the bandwidth at the middle of the disk is not the average bandwidth. During your benchmark you must also take a note of write speed over smaller sections of the disk, say every 1% of the disk. This way, when writing into each section of the disk, you can figure out how to split the data between the "low" and the "high" section that you're writing to. Suppose that you're starting out at 0% and 99% positions on the disk, and the low position has a bandwidth of mean*1.5
, and the high position has a bandwidth of mean*0.8
, where mean
is your desired mean bandwidth. You'll then need to write 100% * 1.5/(0.8+1.5)
of the data into the low position, and the remainder (100% * 0.8/(0.8+1.5)
) into the slower high position.
The size of your buffer needs to be larger than just the block size, since you must assume some worst-case latency for the hard drive if it hits bad blocks and needs to relocate data, etc. I'd say a 3 second buffer may be reasonable. Optionally it can grow by itself if latencies you measure while your software runs turn out higher. This buffer must be locked ("pinned") to physical memory so that it's not subject to swapping.