2

Let's say that I have gathered Disk Transfers per second data for 2x24 hours period, i.e., instantaneous sampling of data every 15 seconds. What statistical analysis can/should I apply to the samples if I want to use the data to, for instance, provision a storage?

Should I simply use the peak value (which happens less than 1% of the time)? Should I user mean/average value? Or a formula involving the mean and the deviation?

pepoluan
  • 5,038
  • 4
  • 47
  • 72
  • We can't tell because you need to know what you want to get out of that data because only you (should) know what your requirements are. – Sven Sep 10 '12 at 16:56
  • IOPS is something you should measure directly. Find the max by performing load tests. You shouldn't be trying to derive a value from the transfer rate. – Zoredache Sep 10 '12 at 17:09
  • @Zoredache : well, the Transfers/s measurements were done on production servers, so I guess it is quite representative of my needs, isn't it? – pepoluan Sep 11 '12 at 01:37

2 Answers2

2

You always size for the peaks, unless it's the kind of workload that can afford to have high latency when it's pushing a lot of IO. That is part of why wide striping is so popular- you can put together a bunch of workloads and size for the peak of their aggregate usage- different parts will peak at different times, so you're able to use cheaper disks to provide the same capacity.

Wide striping assumes that this is on some sort of centralized storage. If it's local, of course you can't aggregate workload that way.

Basil
  • 8,851
  • 3
  • 38
  • 73
  • Thankfully, this is going to be a shared storage, so the IOPS data I've gathered comes from several servers. Hmm... maybe I should do a time correlation first... thanks for your answer! – pepoluan Sep 11 '12 at 01:34
  • What kind of shared storage? – Basil Sep 12 '12 at 02:15
1

Unfortunately, there is no easy answer to that question. First, consider your needs. How much money are you willing/able to spend? How much redundancy do you need? How much total storage do you need? How much latency can you tolerate? How much growth will you have over the amount of time you want the system to last (both growth in size, and in iops)? Do you have time to maintain and prune you data to keep size down?

The closest I can come to answering your question, is to note that if you cannot handle the instantaneous iops at any given time, you will simply increase latency. If latency isnt important, then buying storage based on your projected growth in average iops is not a bad place to start.

NOTE: Redundancy is not a backup solution, so plan for backups as well. Backups can (should) be isolated from your live data by time and space.

csfreak
  • 41
  • 3
  • Agree on redundancy-is-not-backup. We already have proper backup, both system (i.e., image) backup and data backup. Now, we're facing DAS crunch and after some considerations, would very much prefer moving to a SAN Storage. – pepoluan Sep 11 '12 at 01:32