-3

I have a larger array or list of doubles which is not sorted and I want to calculate min, max, mean, median and standard deviation the most efficient way. Of course I could simply use Linq to calculate each one by one, but I think one can go faster. Sample code:

var list = new List<double>(){1.0, 2.5, 0.11, 0.7, 8.2, 3.4, 1.0};
var (min, max, mean, median, std) = CalculateMetrics(list);

private (double, double, double, double, double) CalculateMetrics(List<double> list) {
    // TODO
}

So what is the most efficient way? Using libraries is also fine for me.

Anno
  • 761
  • 1
  • 10
  • 22
  • when you say "larger", like how large? 500? 5k? 5M? 5B? If you have an estimate of your potential array size, you can select a better algorithm. It might not be necessary to do a lot of optimization for a primitive array of size around 1k, considering all the overhead associated. – mcy Dec 29 '21 at 12:17

3 Answers3

3

All the descriptive stats except median you want can be computed in one pass through your list. The trick to getting the standard deviation is accumulating both the sum and the sum-of-squares of your samples. Here's an example of that.

int count = 0;
double sum = 0.0;
double sumsq = 0.0;
double max = double.MinValue;
double min = double.MaxValue;

foreach (double sample in list)
{
    count++;
    sum += sample;
    sumsq += sample * sample;
    if (sample > max) max = sample;
    if (sample < min) min = sample;
}

double mean = sum / count;
double stdev = Math.Sqrt((sumsq / count) - (mean * mean));

Because this makes only one pass through the list, it works with any IEnumerable collection of samples, and is compatible with LINQ.

Obviously this is quick-n-dirty example code. I leave it to you to build it into a useful function.

It will throw a divide check on an empty list. And, if you have very large numbers or very long lists, that subtraction in the computation of stdev may lose precision and give you back a useless number.

But it works well for most applications.

O. Jones
  • 103,626
  • 17
  • 118
  • 172
  • OP wants median also, for which the list needs to be sorted to do it in `O(n)` – Charlieface Dec 29 '21 at 12:32
  • Oh, you're right, I missed the median requirement. So much for the single pass **O(n)** unless the list is known *a priori* to be sorted, as you mention. You may as well use `list.Median()` to get the median; it's decently optimized. – O. Jones Dec 29 '21 at 12:34
0

Because the median is asked for and the standard deviation requires the mean, it makes this hard to do in O(n).

Here's my best attempt:

private (double min, double max, double mean, double median, double std) CalculateMetrics(List<double> list)
{
    var mean = list.Average();
    var std = Math.Sqrt(list.Aggregate(0.0, (a, x) => a + (x - mean) * (x - mean)) / list.Count());
    var sorted = list.OrderBy(x => x).ToList();
    var median = sorted.Count % 2 == 0 ? (sorted[sorted.Count / 2 - 1] + sorted[sorted.Count / 2]) / 2 : sorted[sorted.Count / 2];
    return (sorted.First(), sorted.Last(), mean, median, std);
}
Enigmativity
  • 113,464
  • 11
  • 89
  • 172
0

O(2n) solution:

        private static (double, double, double, double, double) CalculateMetrics(double[] list)
        {
            if (list.Length < 1)
            {
                throw new Exception();
            }

            double min = list[0];
            double max = list[0];
            double median = list[list.Length / 2];

            double sum = 0;
            foreach (double el in list)
            {
                if (el > max)
                {
                    max = el;
                }
                if (el < min)
                {
                    min = el;
                }

                sum += el;
            }

            double mean = sum / list.Length;

            double sumStd = 0;
            foreach (var el in list)
            {
                sumStd += Math.Pow(el - mean, 2) / list.Length;
            }
            double stdDev = Math.Sqrt(sumStd);

            return (min, max, mean, median, stdDev);
        }
JL0PD
  • 3,698
  • 2
  • 15
  • 23