1

I have some results that are stored in a multidimensional array:

double[,] results;

Each column is a time series of prices for a specific variable (e.g. "house", "car", "electricity"). I would like to calculate some statistics for each variable so that to summarize the results in a more compact form. For example, I was looking at the percentile function in Math.Net.

I would like to calculate the 90th percentile of the prices for each column (so for each variable).

I am trying the following, since the function doesn't work on multidimensional array (so I cannot pass results[,] as argument for the percentile function):

for (int i = 0, i <= results.GetLength(2), i++)
{
    myList.Add(MathNet.Numerics.Statistics.Statistics.Percentile(results[,i], 90));
}

So I want to loop through the columns of my results[,] and calculate the 90th percentile, adding the result to a list. But this doesn't work because of wrong syntax in results[, i]. There is no other (more clear) error message unfortunately.

Can you help me understand where the problem is and if there's a better way to calculate a percentile by column?

Cœur
  • 37,241
  • 25
  • 195
  • 267
mickG
  • 335
  • 5
  • 13

1 Answers1

2

Percentile is an extension method with following calling sequence:

public static double Percentile(this IEnumerable<double> data, int p)

So you can use Linq to transform your 2d array into an appropriate sequence to pass to Percentile.

However, results.GetLength(2) will throw an exception because the dimension argument of GetLength() is zero-based. You probably meant results.GetLength(1). Assuming that's what you meant, you can do:

        var query = Enumerable.Range(0, results.GetLength(1))
            .Select(iCol => Enumerable.Range(0, results.GetLength(0))
                .Select(iRow => results[iRow, iCol])
                .Percentile(90));

You can have Linq make the list for you,

        var myList= query.ToList();

or add it to a pre-existing list:

        myList.AddRange(query);

update

To filter NaN values use double.IsNaN:

        var query = Enumerable.Range(0, results.GetLength(1))
            .Select(iCol => Enumerable.Range(0, results.GetLength(0))
                .Select(iRow => results[iRow, iCol])
                .Where(d => !double.IsNaN(d))
                .Percentile(90));

update

If one extracts a couple of array extensions:

public static class ArrayExtensions
{
    public static IEnumerable<IEnumerable<T>> Columns<T>(this T[,] array)
    {
        if (array == null)
            throw new ArgumentNullException();
        return Enumerable.Range(0, array.GetLength(1))
            .Select(iCol => Enumerable.Range(0, array.GetLength(0))
                .Select(iRow => array[iRow, iCol]));
    }

    public static IEnumerable<IEnumerable<T>> Rows<T>(this T[,] array)
    {
        if (array == null)
            throw new ArgumentNullException();
        return Enumerable.Range(0, array.GetLength(0))
            .Select(iRow => Enumerable.Range(0, array.GetLength(1))
                .Select(iCol => array[iRow, iCol]));
    }
}

Them the query becomes:

        var query = results.Columns().Select(col => col.Where(d => !double.IsNaN(d)).Percentile(90));

which seems much clearer.

dbc
  • 104,963
  • 20
  • 228
  • 340
  • Wow this works nicely. But I am getting some NaN in the results. This might be because of some NaN in the time series. How can I modify the query to avoid this NaN? Thank you. – mickG Feb 14 '15 at 00:19
  • @mickG - Filter with `double.IsNan()`. – dbc Feb 14 '15 at 00:23
  • Thank you for your kind help. This all work as expected. – mickG Feb 14 '15 at 00:25