1

I have a large matrix (30000 x 500), a column represents hourly data for the next 3 years, each column is a different scenario, i.e. I have 500 scenarios of prices where each cell in a row has the same timestamp.

I need to aggregate this over the time axis, so if daily i need to make a matrix of (30000/nrdays x 500), if monthly (30000/nrmonths x 500) and obviously also keep the right dates.

In matlab I created an index with a unique number for each day or each month and then looped over the columns using this:

accumarray(idx,price(:,i),[numel(unique(idx)) 1], @mean)

If I want to do this in c# what is the best way?

below is what I have so far:

public class matrixwihtdates
    {
        public DateTime dats;
        public ILArray<double> nums;
    }
    public class endres
    {
        public string year;
        public string month;
        public string day;
        public ILArray<double> nums;
    }

    public static List<endres> aggrmatrix(ILArray<double> origmatrix, DateTime std, DateTime edd)
    {
        var aggrmatr = new List<matrixwihtdates>();
        for (int i = 0; i < origmatrix.Length; i++) 
        {
            aggrmatr.Add(new matrixwihtdates
            {
                dats = std.AddHours(i),
                nums = origmatrix[i, "full"],
            });
        }

        return aggrmatr.GroupBy(a => new { yr = a.dats.Year, mt = a.dats.Month })
            .Select(g => new endres { 
             year = g.Key.yr.ToString(), 
             month = g.Key.mt.ToString(), 
             nums = ILMath.mean(g.Select(a => a.nums).ToArray(),1) }).ToList();


    }

The key problem is that I don't know how to average over each of the columns within the LINQ syntax so that a vector (1x500) is returned. Or should I not use LINQ? My last line above doesnt work.

UPDATE:

I have added a more imperative version without LINQ, this seems to work but a bit clumsy still.

        public static List<ILArray<double>> aggrmatrixImp(ILArray<double> origmatrix, DateTime std)
    {
        List<ILArray<double>> aggrmatr = new List<ILArray<double>>();
        ILArray<double> tempmatrix;
        int startindicator = 0;
        int endindicator = 0;
        int month = std.Month;
        for (int i = 0; i < origmatrix.Length; i++)
        {
            if (std.AddHours(i).Month != month)
            {
                endindicator = i - 1;
                tempmatrix = origmatrix[ILMath.r(startindicator, endindicator), ILMath.r(0, ILMath.end)];
                aggrmatr.Add(ILMath.mean(tempmatrix, 1));
                startindicator = i;
                month = std.AddHours(i).Month;
            }
        }
        return aggrmatr;
    }

I would still like to make the LINQ version work.

Update 2

I took Haymo's advise into account and here is another version that is twice as fast.

public static ILArray<double> aggrmatrixImp2(ILArray<double> origmatrix, DateTime firstdateinfile, DateTime std, DateTime edd)
    {
        int nrmonths = ((edd.Year - std.Year) * 12) + edd.Month - std.Month;
        ILArray<double> aggrmatr = ILMath.zeros(nrmonths,500);
        int startindicator = std.Date.Subtract(firstdateinfile.Date).Duration().Days*24;
        int endindicator = 0;
        DateTime tempdate = std.AddMonths(1);
        tempdate = new DateTime(tempdate.Year, tempdate.Month, 1);
        for (int i = 0; i < nrmonths; i++)
        {
                endindicator = tempdate.Date.Subtract(std.Date).Duration().Days * 24-1;
                aggrmatr[i, ILMath.full] = ILMath.mean(origmatrix[ILMath.r(startindicator, endindicator), ILMath.full], 1);
                tempdate = tempdate.AddMonths(1);
                startindicator = endindicator+1;               
        }
        return aggrmatr;
    }

I do not have a working LINQ version but I doubt it will be faster.

nik
  • 1,672
  • 2
  • 17
  • 36
  • What was the bad way you tried? – Renatas M. Sep 10 '14 at 10:08
  • I have added my attempt but it doesnt quiete do the trick yet – nik Sep 10 '14 at 11:02
  • for a simple average over columns one could use: [`ILMath.mean`](http://ilnumerics.net/apidoc/?topic=html/M_ILNumerics_ILMath_mean_3.htm) or [`ILMath.sum`](http://ilnumerics.net/apidoc/?topic=html/M_ILNumerics_ILMath_sum_3.htm) – Haymo Kutschbach Sep 10 '14 at 16:32
  • thanks Haymo but how do I combine this with the LINQ syntax? I tried (see above) but that doesnt work. Or how else could I identify partial matrix so that ILMath.mean works? – nik Sep 10 '14 at 18:16
  • It would be interesting to know if utilizing the ILNumerics function rules would generate further speedup? I would expect so, since you are working with large data (living on the LOH) and probably trigger GC Gen2 collections a lot. – Haymo Kutschbach Sep 15 '14 at 23:16

1 Answers1

2

Your update version fits the way arrays should be handled in ILNumerics much better.

For Linq and ILNumerics.ILArray<T>: The IEnumerable<T> which is used to enumerate an ILArray<T> iterates over all elements in column major order. See here: http://ilnumerics.net/blog/ilnumerics-and-linq/

ILNumerics is optimized for the imperative, array oriented version you are using in the question update. If you decide to use Linq anyway, I'd suggest to aggregate within the Linq statement manually and not to rely on ILMath.mean.

You could try to optimize your second example from the update as follows (in random order):

1) Keep the result in a matrix (ILArray<double> aggrmatr) instead of List<ILArray<double>>. You could also use ILCell if you would want to store the dates as well. But your example does only store the aggregated numbers. ILArray<double> would therefore be sufficient.

2) Preallocate the resulting ILArray<double> and loop through its rows (rather than through the rows of origmatrix) - exactly once. The number of resulting rows should be known in advance, right? At least if the dates rows represent dates/times you can compute the start and endindicator according to the current row in tempmatrix and use that as you did.

3) Use ILMath.full or ":" to indicate a full dimension in a subarray expression.

4) Use the ILNumerics function rules! Your matrices are sufficiently large so that this will most probably give some speedup due to memory pooling and more efficient parallelization.

5) Formulating your function in a class which is derived from ILMath leads to a much nicer syntax by letting you ommit the ILMath. identifier on all expression like ILMath.r(..., ILMath.mean.

Also, some rules to keep in mind when using ILArray as class attributes (as in your first example) see: http://ilnumerics.net/ClassRules.html

Haymo Kutschbach
  • 3,322
  • 1
  • 17
  • 25