3

I am currently benchmarking a hard drive. I am using HW32 for the measurement.

The result has two parts:

random seek time: 20 ms

random read throughput: 30 Mbytes/s

I am not sure the methods HW32 was doing for the benchmark.

But I find it very strange for the random seek time result.

From my understanding, random seek time means the time that is spent on where a specific data is. So I presume that for a random read should contain many times of random seek, right?

For example, I try to read 100MB of data from the disk. And because of the fragments of the disk, it has 1000 random blocks on the disk, each place has 100KB data. So when I read it, the disk head will have to move 1000 times to find all the data blocks, right?

So if random seek time is 20ms, then does that mean we will have to spend 1000 * 20 = 20,000ms = 20 sec on the random seeking? I guess not, right?

Can anyone explain to me? If a benchmark like HW32 tells me random seek time = 20 ms, what does that mean? Does that mean total random seek time for a random read or average seek time?

Thanks

EEAA
  • 109,363
  • 18
  • 175
  • 245
Jack
  • 131
  • 1
  • 2

4 Answers4

2

Random seek time is the average time the disk needs to reach the position where the data is located and read a single block (or even a single sector). The real values can both be much higher and much lower than that.

For the random read rate there is usually more than one block read at the same position to calculate the average, so this rate depends on both the seek time and the linear read throughput the drive has.

Modern disks even have ways to increase the transfer rate with a technique called Native Command Queuing that allows them to resort the requests in order to minimize head movements.

ewwhite
  • 197,159
  • 92
  • 443
  • 809
Sven
  • 98,649
  • 14
  • 180
  • 226
  • if a random read invovles 120 blocks, then there will be 120 random seek time, right? so a naive calculation is that a 120 block of read will have 120 * 20 ms = 2.4 sec random seek time, right? – Jack May 24 '12 at 12:54
  • You have to differentiate between 120 blocks randomly arranged on the disk and one random read that can suck in 120 blocks in one sweep. The first will indeed take about 2.4 sec on average (!!), while the second is done almost immediately. Of course, there are endless possibilities in between. – Sven May 24 '12 at 13:00
  • 3
    Also, modern filesystems work very hard to avoid fragmentation in the first place and having a 100MB file in 1000 locations in highly unlikely. – Sven May 24 '12 at 13:03
  • If HW32 gives `random seek time = 20`, how should I understand it. Is that average or peak? for example, it can be that there are 120 blocks seeking seriously, so average is 20ms. It also can be only one serious random seek and then 120 blocks are continuous, then HW32 just record the peak. Otherwise, the value should be much lower in the 2nd case. is it? – Jack May 24 '12 at 13:04
  • Also, in a sweep (continuous blocks), let's say 120 blocks are all continuous, can I say there are still 120 random seeks inside when reading and it is just this kind of random seek is nearly nothing? – Jack May 24 '12 at 13:06
  • I can't really speak about HW32, but I guess it will be an average time except there is something about peak stated. For the second question: No, it will be one read operation. – Sven May 24 '12 at 13:15
  • The random seek time is typically defined as the average time to read a randomly-selected, small chunk of data after just having read another randomly-selected, small chunk of data. – David Schwartz May 24 '12 at 14:05
2

You're assuming the system seeks the random blocks in random order. It would never do that.

If you have to buy something at a randomly-chosen store, it may take you on average two hours to drive to that store, pick something up, and then drive back. But if you had to drive to 1,000 randomly-chosen stores, it wouldn't take you two hours per store because you would pick the optimum order. Some stores would be right next to each other. And so on.

David Schwartz
  • 31,449
  • 2
  • 55
  • 84
  • This works only for both a short timeframe and a small number of requests. If the timeframe for the sorting is too long, you would need to wait for the optimization to finish, and if the number of requests is too large, you end up in the travelling salesman's land. – Sven May 24 '12 at 14:14
  • 1
    @SvenW: even TSP can be solved quickly if only need to get close enough to the optimal solution, and not necessarily the "most" optimal solution. The textbook example of disk-seek optimization is the elevator algorithm (lookup on Wikipedia). – Lie Ryan May 24 '12 at 14:26
  • @SvenW: If you think realistically about real people who actually have to pick things up at multiple stores, you'll quickly realize that it actually works perfectly fine regardless of the number of requests or the timeframe. For the vast majority of cases, it is extremely easy to dramatically outperform a random order traversal, even without super-advanced mathematics. – David Schwartz May 24 '12 at 14:52
  • 1
    @DavidSchwartz: I was moving your analogy back to the real-world case of disk random access. The system can't afford to wait until it has a number of requests to sort before starting but has to try to do very local optimizations as it goes along. – Sven May 24 '12 at 14:58
  • I think what SvenW is saying, is that it gets a head start on the requests, and constantly then goes to the next closest sector of the ones it has already been fed. @DavidSchwartz, imagine in your example, that the store names were given one at a time. In such a case, you'd only have a few to look at, and thus would have to pick the order based on a slowly growing pool. – acolyte May 24 '12 at 18:46
  • 2
    @acolyte: reading a single large, fragmented file will often issue a request to read multiple disk sectors (up to the buffer's size); and all OS nowadays are multitasking and is running hundred of processes, they'd have no problem getting lots of I/O requests from multiple running processes. – Lie Ryan May 25 '12 at 05:55
0

Your numbers are a little strange. An average latency of 20ms means 50 IOs per second, and in order for those 50 IOs to add up to 30MB, you'd have to be using an unusually large sector size. Is there any chance they're reporting peak values rather than average?

Basil
  • 8,851
  • 3
  • 38
  • 73
-1

20ms is not an unusual value for a consumer grade drive at all.

rackandboneman
  • 2,577
  • 11
  • 8
  • if it is usual, then how can random read be so quick, 30MBytes/s? – Jack May 24 '12 at 12:34
  • Depends on the blocksize of the random read. And you will only take 20msec for something that has neither been cached because it already was requested, or was even gratuitously cached since it was near the disk head anyway. – rackandboneman May 24 '12 at 12:40
  • so you mean random seek time I get is a peak time, not the average? – Jack May 24 '12 at 12:44