2

I've seen in many places that throughput = bs * iops should be true. For example writing at 128k block size to a SAS disk that can support 190 IOPS should give a throughput of ~23 MBps - 23.75(MBs) = 128(BS)*190(SAS-15 IOPS)/1024.

Now when I tested it in a VM against a monster NetApp filer I got theses results:

# dd if=/dev/zero of=/tmp/dd.out bs=4k count=2097152
8589934592 bytes (8.6 GB) copied, 61.5996 seconds, 139 MB/s

To view the IO rate of the VM I used iostat and esxtop, and they both showed around 250 IOPS.

So to my understanding the throughput was supposed to be ~1000k: 1000(KBs) = 4(BS)*250(IOPS).

dd of 8GB is twice the size of RAM of course, so no page caching here.

What am I missing?

Thanks!

BlackBeret
  • 123
  • 2
  • 10

3 Answers3

5

That you are missing is the context. IOPS is FULLY RANDOM. A copy is not random but sequential. Hard discs get slow when the head is moved - the IOPS basically assumes, properly measured, IO that is randomly distributed over the complete disc platter (or at least a large part of it).

Yes, you are a lot faster when copying a disc. SADLY that is totally irrelevant unless your normal usage is only copying by ONLY ONE USER AT A TIME.

That is like measuring the top speed of a formula 1 car and then assuming that this is the average speed during a race- bad mews, formula 1 tracks have corners, cars mostly go a lot slower.

So, if you do not do totally degenerated patterns (in the technical term), i.e. only have one copy operation at a time, then the IO will be random (especially virtual machines- one may be sequential, 20 hitting the same disc is random) and the head spends most of the time moving, not doing IO operations.

dd of 8GB is twice the size of RAM

It still is pathetic, is is not? How l,large is the disc? (gb is only a small part, so the "random" part is very few movements (measured in length) compared to the real world scenario ;) Actually no randomm movement as you copy from a zero source, soiit is only writing, never moving the head. BAD ;)

ON TOP:

against a monster NetApp filer

ANY idea how much those large SAN items are able to optimize your IO? How much cache does it have? A "monster" filer would be one of the top models, which has 16+ gigabyt ememory for its own cache use. If it ireally a monster, your file is pathetic - wikipedia reads the top line of 2010 (!) having 192gb memory ;) Does not even realize when buffering 8gb. And deduplication (does it happen real time?) may eliminte pretty much all the write operations. Are you sure you did even measure disc based IOPS?

TomTom
  • 51,649
  • 7
  • 54
  • 136
  • Why does it matter if the filer cached it to its RAM first or if it was a sequential write or not? In the bottom line the VM managed to "generate" ~250 IO operations per second, at block size of 4k, and throughput should be derived from it(?). – BlackBeret Apr 15 '12 at 14:52
  • Becasue if he wants to measure disc performance then a RAM cached answer does not give him anything. This is like measuring car powrer by "it reaches 100km/h top speed" then saying what does it matter what power the car has when it reaches the speed. WHen you want to measure how fat the disc is, the test MUST be true - if not, you basically measured nothing but illusions. Then you may make a decision based on that - and the decision is bad. – TomTom Apr 15 '12 at 15:25
  • So what you say is that (throughput = bs * iops) is true only for hard drives and now for let's say ramdisk? I don't think so.. – BlackBeret Apr 15 '12 at 16:03
  • 2
    No. But that if you measure the wrong fact you measure the wrong fact. As in: For discs, IOPS hare very depending on usage pattern (how much time is wasted moving the head), which is constant for a ram disc - and stuff like RAM caches, plenty of them in large SAN, mean that you have to make sure that those are not falsifying results. Your test is simply not measuring the IOPS budget of a disc under normal random load circumstances but under laboratory scenarios with near zero seek time. – TomTom Apr 15 '12 at 20:00
  • 1
    TomTom's dead on. You tested under unrealistically perfect conditions and got an unrealistically perfect result. – David Schwartz Apr 15 '12 at 20:32
  • I still don't get it.. To my understanding the difference would be in terms of IOPS only, because IOPS is based on these factors: spindle speed(if any), average latency and average seek time. So Disk vs RAM is just a question of who can handle more I/O operation per second. – BlackBeret Apr 16 '12 at 10:31
  • Pretty much. IOPS assumes that the distribution of the sectors written is random over the platter. His means a lot of movement. If you write a sequentoial file, there s no movement, so a lot more IOPS are done. Here is the point though - NOONE CARES. IOPS measurement is done for random workloads, and there it is a"worst case" number. Not "someone writing a file" but "30 virtual machines writing files so the head moves along all the way". It is what the disc gets under these circumstances. – TomTom Apr 16 '12 at 12:54
  • Regarding who can handle more - lets just say that is comparing a fiat punto vs. a formula 1 car. A decent high end SAS disc gets you around 450 IOPS. Not bad for a disc. My SSD (Vetex 3) have 40.000 to 60.000 IOPS - totally different game. SSD always beat discs - they have ZERO seek time, random access is identical to sequential. The problem is when you do not need the IOPS but the storage capacity... SSD are kind of crazy expensive, still. But when you need random workload performance (IOPS).... SSD are cheap compared to discs. – TomTom Apr 16 '12 at 12:56
  • Still don't get it sorry and thanks for bearing with me :) Let's say that I do sequential writing to an SSD/RAM device - no moving parts, still, each I/O transaction is in X size(block size) and takes Y time to complete. And the device can perform only Z I/O transaction per second. So throughput per sec. = X * Z, whether 'Z' is 70 or 60k and IOPS as a measurement is valid here. This should be true for random writes as well(and Z would be lower). Maybe esxtop is bluffing? :) – BlackBeret Apr 16 '12 at 13:27
  • 1
    No, this is true- the main problem is that the IOPS number for SSDD is constant regardless where it accesses, while the IOPS for HD goes down for random access. Also, IOPS always includes a specific size of IO. SSD for example often give it as IOPS and 4kb - because larger blocks take more time. But yes, that is the idea. – TomTom Apr 16 '12 at 15:00
  • In your above example, though, you run into the IOPS measurement trap. The SAS disc you ahve supports 190 IOPS as you say - that is FULLY RANDOM where most time is spent seeking. Your test is not random, so you get a LOT higher IOPS. Unless the nubmers are faked (buffers somewhere that you forgot to turn off) you got about 1110 IOPS - good. Point is: that is like "top speed" (seuential access, no time used moving the heads) while IOPS gives "real use speed". Take it like this. Car, speed in city. – TomTom Apr 16 '12 at 15:02
  • You did not measure speed in city (with braking, traffic, cornders) but an optimal case where no time is spent for that - and no regulations hit - you measurud the top speed of the car. That is something different. So, your disc got a lot more than 190 IOPS that you says it has - but that was under different conditions than IOPS are measured. You measured the case where the head does barly move. Sequential read / write. IOPS "numbers" for discs are fully random. Over the FULL part of the disc, not a small part. – TomTom Apr 16 '12 at 15:03
  • Thanks but my question wasn't about getting lower or higher IOPS then expected. It was about the throughput i got. How come the throughput in the test made this equation false [Throughput = BS * IOPS]. If this equation was true(and it is true) then I was supposed to have 1000KBs throughput not 139MBs. – BlackBeret Apr 16 '12 at 17:27
  • But then basically the higher throughput is BECAUSE you got a higher IOPS than the worst case, which is what is normally measured. Dont run in circles - car goes faster because of higher engine output is not "but i did not ask about engine output, I ask why the car goes faster than expected". ItYou get higher throughput BECAUSE under the lab scenario you set up, you get more IOPS. – TomTom Apr 16 '12 at 17:51
  • According to what you say Throughput = BS * IOPS is only true under certain circumstances, for example a loaded SAS storage + random I/Os. In other conditions, other variables that can effect the throughput may exist. That's something that I find difficult to agree with(because I don't understand it). Interesting discussion though.. :) – BlackBeret Apr 16 '12 at 18:25
  • 1
    No. Throughput = BS*IOPS is always true, just IOPS varies depending on usage pattern,. The IOPS numbers published assume either "random load" or are best case ("up to"). Point is tht stuff likeaverage seek time etc. play into the game - and never forget WHAT IOPS NUMBERS ARE FOR - estimating server workloads, and server workloads are NEVER sequential. File copy - random. Because 200 people hit the same platter array for various files at the same time. – TomTom Apr 16 '12 at 19:34
  • 1
    You are confusing two different statements. One is "BS * IOPS from specification = measured throughput". The other is "BS * IOPS actually done = measured throughput". The first one is only true under the conditions under which the specification's IOPS is valid. The second is always true. – David Schwartz Apr 17 '12 at 08:54
  • OK I got it, but I was talking about what I've measured in my test, 139MBs(Measured throughput)=4(Block size)*250(IOPS Actually done), which makes the equation false. Where is the confusion? – BlackBeret Apr 17 '12 at 11:22
  • Again, due to the setup you did you simply got more IOPS than the 250 you assume. That simple. – TomTom Apr 17 '12 at 12:19
  • That's the point, I didn't assume anything. 250 IOPS is what both esxtop and iostat reported. – BlackBeret Apr 17 '12 at 14:21
  • Well, but those are different tests - i.e. they measure "real world IOPS" and your throughput test uses a different scenario where the real world IOPS number is too low - that is what i tell you for days. – TomTom Apr 17 '12 at 19:33
  • @TomTom Maybe you should update your answer with some of the info from these comments - they're getting rather long-windwd – voretaq7 Apr 17 '12 at 19:38
0

There's an app called SQLIO, don't worry about the name it's actually got nothing to do with SQL it was just written by the SQL Server team at Microsoft, which will let you test your disk with random IO (read or write) and see just how much load the disks can handle. You can download it from Microsoft's site.

mrdenny
  • 27,174
  • 4
  • 41
  • 69
-1

If you're going to use throughput = block size * IOPS, you have to use the block size of the I/O operations you are counting, not the block size of the file system, not the block size of the block device.

The 139MB/s is probably a bit higher than what you really got because I/O was likely continuing when the measurement stopped. The block cache was likely still flushing. So it seems like the most logical explanation is that the size of the underlying I/O operations you are counting is 512KB.

The block size of the I/O operations has to be some multiple of the block device's block size. I believe you say that's 128KB. So 512KB (4 block) operations are certainly possible.

512 * 250 = 128MB/s

David Schwartz
  • 31,449
  • 2
  • 55
  • 84