Measuring disk usage

Question

What is the fairest unit to determine the disk usage (except the total storage usage e.g. in GB)?

IOPS?
Bytes read / write?
Bytes transferred over network (would be equal to Bytes r/w)?
Other?

And how can they determined under Linux?

It's about the billing for virtual storage instances, accessed by virtual servers over the network.

Technically disk usage measured in terms of IOPS and bytes r/w is good but sometimes not enough, or does not give you the right picture (see for example this article: http://www.mysqlperformanceblog.com/2014/06/25/why-util-number-from-iostat-is-meaningless-for-mysql-capacity-planning/). But from a commercial point of view IOPS and throughput used could be enough as the goal is to bill those that are using an excessive share of resources so that the sharing of these resources between VMs is viable. — Huygens, Aug 06 '14 at 10:52
Thank you! So you think I should measure IOPS AND bytes r/w? IOPS vary according to r/w-length. I think that's not really fair. Just to measure the bytes r/w should be more accurate, beause more bytes = more load. What do you think? Do you have another method? — Hativ, Aug 06 '14 at 14:46
I don't know what is "fair". But if someones does lots of small random reads, he could saturate the capacity of the storage in terms of IO, eventhough that he might reads less MBytes than a user reading a big chunk of contiguous data. What you want to bill and what you want to "limit" to keep all of your users happy is a commercial decision in the end, for which I have no particular thinking. — Huygens, Aug 06 '14 at 16:17

score 1 · Accepted Answer · answered Aug 06 '14 at 10:07

You can use the following commands for disk monitoring:

# sar -p -d 1 1

(if sar package is installed on your system), with sar also you can check your disk read/write in the history from the /var/log/sa/sar-01 file.

and

# iostat -d -x 3 3

where the last two numbers are interval in seconds and number of repetition and

-d : Display the device utilization report (d == disk)
-x : Display extended statistics including disk utilization

score 1 · Answer 2 · answered Aug 06 '14 at 16:50

The answer is more complex because it's a subtle problem. Not all storage is created equal, and different types of work require different types of storage.

The most common types of storage and their measurement characteristics are:

Small block intensive random IO: typically databases, sometimes office file shares, the bottleneck for many of the workloads that do this type of work is IO/s, assuming mostly read, mostly random (unpredictable), mostly small block IO. This type of work is usually done on small datasets, so SSD is optimal. SSDs have no moving parts, and can generally deliver thousands of IO/s each. Cache is extremely effective here because writes are typically far rarer than reads, and all writes get cached, leading to very low latency. The measurement of this storage is latency and maximum IO/s (as well as how high the latency goes when bottlenecking IO/s).
Streaming IO: typically backups, sometimes file shares for media feeds, the bottleneck is more likely to be the number of bytes per second you can stream. Writes here also benefit from cache, but if it's truly streaming for longer than a short spike of activity, the cache saturates and the speed you observe is essentially the speed at which the cache can destage writes to disk. In this category, the size of the storage is much more likely to be large, and because SSDs, while still the fastest, aren't an order of magnitude faster for streaming IO than disk (like they are for small random IO), you'll see regular drives for this. 15k RPM, 10k RPM, 7.2k RPM. Many times, if the size is more important than the speed, you'll see large slow drives (nearline SAS or SATA) with 1TB-6TB capacity and 7.2k RPM spindle speeds.
Mixed workload: typically hypervisors. They take a bunch of streaming IO virtual machines, a bunch of small block random IO virtual machines, and a bunch of system tasks that allow for hypervisor features like snapshots and moving machines around between storage pools and throw them into the same IO queue. You'll have the hardest time servicing this type of IO with a single tier of disk, so best practices in shared storage (like a SAN or NAS) are to use multiple types of disk and storage-layer intelligence that tries to serve the random IO hotspots with SSD and the streaming IO with spindle drives. Measuring this type of storage is latency and peak IOPS, as well as bytes/s throughput.

Measuring disk usage

2 Answers2