How should I interpret disk IOPS listed by cloud hosting providers vs. those listed by drive manufacturers?

Question

When I look at disk (block device) storage options from various cloud hosting providers, I usually see numbers such as :

Google Cloud (Zonal SSD) : 15.000 - 100.000 read IOPS
OVH Cloud : (High Speed / SSD ) : Up to 3.000 IOPS
AWS : (io1 / SSD) : Up to 64.000 IOPS

I do not know anything about the underlying technology.

Even if these cloud providers would use some of the slower SSD options available (regular consumer SATA SSDs), some of these disks comes with IOPS specifications for reads and writes in the range of 90.0000 and up (looking at 860 EVO SSD 2.5). An NVMe SSD would give far better throughput. Even if these cloud providers would stack these SSD disks into some sort of storage cluster, I'd still be surprised to see that the IOPS would fall from 90.000 to 3.000.

I have the feeling that these numbers are not comparable, even though the same metric (IOPS) are used.

How should I interpret the disk IOPS listed by cloud providers vs. the disk IOPS listed by disk manufacturers?

You are aware that the cloud providers do not give you discs, but often Raids that often are distributed over systems. Yes, they are slower - but you REALLY are comparing single disc numbers with distributed systems or at least raid numbers. — TomTom, Jul 02 '20 at 07:21
@TomTom, yes, absolutely (as mentioned). But a regular consumer NVMe disk (Samsung 970 EVO) has an IOPS (max random read) rating of 600.000. Seeing that number fall to 3.000 - which is 0.5% of the listed performance - seems wild to me. I know 3000 may not be the numbver of max random reads, but still... — sbrattla, Jul 02 '20 at 07:24
That is the quality of their services. One of the many not so well know aspects. If you're not satisfied with such junk services you can always host your own infrastructure. — Overmind, Jul 02 '20 at 07:31
No, it does not. It has pretty much a burst rating on that. Those numbers are way more stable. Ignore the marketing material, look at the stats of high end discs. — TomTom, Jul 02 '20 at 07:31
On a single disk array, there may be dozens of clients roaming. They cannot prove more IOPS in such a scenario. — Overmind, Jul 02 '20 at 07:33
@TomTom i'm probably comparing apples and oranges...but the numbers do not seem very aligned. IOPS in cloud services ranging from 3.000 to 100.000, IOPS for SSD disks ranging from 100.000 to 600.000 - and then finally iostat on my own linux reporting IOPS in the range 400 - 500 IOPS when writing full speed to disk (A Samsung EVO NVMe with an IOPS rating in the range of hundreds of thousands). Is this just like the flashlights on e-bay which are advertised with thousands of watts and millions of lumens : in other words just marketing gibberish? — sbrattla, Jul 02 '20 at 07:50

score 3 · Accepted Answer · answered Jul 02 '20 at 13:56

Google does specify 900.000 to 2.700.000 IOPS for a local SSD. That shows their hardware is perfectly capable. The "zonal SSD" has a much lower IOPS, but that is a disk which is accessible by all servers in the particular zone. That means it's remote to the server where your code is running, and there's software between your server and the SSD to manage concurrent access.

Yes, that's costing a lot of IOPS. That's not unexpected. Just look at the huge difference between the local NVMe SSD (2.700.000 IOPS) and the non-NVMe (900.000 IOPS). You already lose 66% of the raw performance just by introducing a single slow bus between the flash chips and the CPU. That's probably a few centimeters of SATA cable and the SATA chips on both sides of that cable. Raw SSD speeds are so blisteringly high that any overhead will be huge.

Intel even considered NVMe to be too slow for their Optane storage product, and went for DIMM, just like RAM. That makes sense; Intel's CPU's can do several billion memory transfers per second. (not million, it's really three orders of magnitude more). However, Optane appears to be failing in that respect: it's stuck well below a million IOPS and the DIMM interface seems ludicrous overkill. But the direction is clear; even NVMe might soon become too slow for local storage. The recipe for speed is direct access without overhead. The figures you quote just show how badly performance can drop when you add overhead.

score 2 · Answer 2 · answered Jul 02 '20 at 13:08

Quotas. Multi-tenancy. Counting host IOPS after redundancy. Scalability limits with their (probably IP based) storage stack. Selling a premium faster SSD disk. Actually being honest and conservative with what is practical. The list of possible reasons is long.

Should one disk be too limiting, you can attach several and use them all on one host, say with LVM. A bit strange to have to size SSDs for IOPS rather than capacity, but perhaps that is the constraints of these disk types.

If you wish to run your own storage array, do that. Of course, that means you cannot use the managed storage of say AWS or GCP.

Whatever your storage options are, you should test with something resembling your workload. Realistic load if you can, synthetic IOs with fio or diskspd if you have to.

Especially if you actually need to push 100k IOPS. That level of load still is a serious exercise of a storage stack.

How should I interpret disk IOPS listed by cloud hosting providers vs. those listed by drive manufacturers?

2 Answers2