1

I have a rather specialized query I am trying to figure out in C#.

I have a class:

class TimeValues 
{
    DateTime When;
    ImmutableArray<float> Values;
}

This represents a report of a number of sensors at a particular time. Which I use in an ImmutableArray<TimeValues> SomeArray, that represents a series of reports often down to the second.

The problem I am trying to solve is how to group by 30 second intervals, and average the reports of each sensor individually.

So for example, if I have two reports:

      s1   s2   s3
1:20  10   20   30
1:21  30   50   70

and we assume that t1 and t2 are within 30 seconds of each other, I want the operation to result in:

      s1          s2          s3
1:00  avg(10,30)  avg(20,50)  avg(30,70)

I have started with something such as:

SomeArray.GroupBy(k => k.When.Second >= 30
       ? k.When.AddSeconds(-k.When.Second + 30)
       : k.When.AddSeconds(-k.When.Second), k => k.Values)
   .Select(group => new TimeValues(group.Key, ...))

It is the last line that I can't quite figure out. One point that must be stressed is that the order of the values being averaged must be maintained as it has to correspond with the sensors reporting. This is my first time using group by in LINQ, and probably one of the more complicated.

Peter Duniho
  • 68,759
  • 7
  • 102
  • 136
Jeffrey Drake
  • 805
  • 10
  • 26

2 Answers2

2

I guess you can't write it in a fancy one-line way but you can still make it work with something like this:

        var aggregateValues = timeValues
            .GroupBy(k => k.When.Second >= 30
                ? k.When.AddSeconds(-k.When.Second + 30)
                : k.When.AddSeconds(-k.When.Second), k => k)
            .Select(group =>
            {
                var tv = new TimeValues() { When = group.Key };
                var values = new List<int>(3);
                for (int index = 0; index < 3; index++)
                {
                    values.Add(group.Average(t => t.Values[index]));
                }
                tv.Values = values.ToImmutableArray();
                return values;
            });

You should also note that it is undesireable to specify array length (number 3) in this selector code like I did. You should probably declare this constant somewhere statically and make sure with explicit checks in constructor or property setter that your TimeValues instances always have 3 values in thier Values arrays. This will help you to aviod IndexOutRangeExceptions.

  • In any of the report files, the number of columns is always the same. – Jeffrey Drake Jul 29 '16 at 05:57
  • It's not the way I'd do it (see my answer), but the above is certainly a reasonable approach and IMHO this is a good, useful answer. – Peter Duniho Jul 29 '16 at 06:16
  • I have learned quite a bit about grouping by a single key, I ended up taking Peter's answer and your inner part and putting it into this: https://dotnetfiddle.net/S30uct – Jeffrey Drake Jul 31 '16 at 05:48
2

Arguably, your question is a duplicate of Average int Array elements with a GroupBy. However, I'm not thrilled by the specific answer, i.e. that it iterates the group results multiple times, once for each index in the values array. IMHO it's better to iterate the group once, putting the repeated iterations over the values arrays themselves. And the presentation of your question is better than the other one, so I'm putting an answer here. :)


First, I don't understand your grouping function. If you want intervals of 30 seconds, it seems to me that just dividing the seconds by 30 should give you a good grouping key. You seem to be going to a lot of trouble to accomplish basically the same.

Second, I didn't feel like installing the package with ImmutableArray<T> and that class doesn't really have anything to do with the question so my answer just uses a plain old array.

Third, I'm not convinced this answer even does what you want. The one from Meleagre looks pretty good, but I would take a different approach, shown below:

var result = from g in (from d in data
                 group d by (int)(d.When.TotalSeconds / 30))
             let c = g.Count()
             select new TimeValues(TimeSpan.FromSeconds(g.Key * 30),
                g.Aggregate(new float[g.First().Values.Length],
                    (a, tv) =>
                    {
                        for (int i = 0; i < a.Length; i++)
                        {
                            a[i] += tv.Values[i];
                        }

                        return a;
                    },
                    a =>
                    {
                        for (int i = 0; i < a.Length; i++)
                        {
                            a[i] /= c;
                        }

                        return a;
                    }));

The above uses the LINQ Aggregate() method to accumulate each value in its respective index, and then computes the average at the end. Two different lambda anonymous methods are used for these functions, respectively. IMHO, the code would actually be a bit more readable if you broke those out into actual named methods. Either way is fine.

I prefer this approach because it minimizes object allocations (no need to build a list and then convert to an array at the end) and IMHO expresses the intent behind the code more clearly.

I trust you can adapt the array-based example to work with ImmutableArray<T>. :)

Community
  • 1
  • 1
Peter Duniho
  • 68,759
  • 7
  • 102
  • 136
  • There is a lot I like in your answer. One issue is that you are assuming When is a TimeSpan when it is a DateTime - although I could consider switching it for this purpose, or making a property to interpret it. I will definitely use the division of seconds because it avoids a conditional that I didn't consider the alternative for. – Jeffrey Drake Jul 29 '16 at 14:10
  • Yes, sorry I forgot to mention that. It doesn't really matter whether you use `TimeSpan` or `DateTime`, the basic technique is the same. If you want to use `DateTime`, you can do the division approach by using the `Ticks` property and dividing/multiplying by 300,000,000 (there are 10,000,000 ticks in a second), or just stick with the version you're using now. You can use `TimeSpan` just for the grouping/averaging operation by creating a `TimeSpan` value by subtracting a fixed `DateTime` from all your `DateTime` values for the query and then adding back at the end. – Peter Duniho Jul 29 '16 at 15:19
  • (I admit, the main reason I used `TimeSpan` here was out of convenience. Since you didn't provide an easy-to-copy [mcve] to start with, it was much easier for me to create a sample data set using `TimeSpan` than `DateTime` mainly because there's less typing with the former than the latter :) ) – Peter Duniho Jul 29 '16 at 15:21
  • Peter, thank you for everything you have pointed me towards. I ended up taking your stuff and simplifying the problem a little by extracting out the custom functionality on the arrays. I made the grouping via integer increments of 10 to eliminate the dependency on time. I put up some code on https://dotnetfiddle.net/S30uct It also seems that from g in (... group ... by ... ) Is equivalent to group ... by ... into g Unless there is some subtle distinction. I am starting to get more used to the query syntax, it has its cleanness. Please let me know what you think. – Jeffrey Drake Jul 31 '16 at 05:45
  • You are correct that `group...by...into g` is an alternative syntax. I'm glad everything worked out and you have something you like now. – Peter Duniho Jul 31 '16 at 05:49