-3

Let's say I have a List of items in which look like this:

Number Amount
1      10
2      12
5      5
6      9
9      4
10     3
11     1

I need it so that the method takes in any number even as a decimal and use that number to group the list into ranges based on that number. So let's say my number was 1 the following output would be...

Ranges Total
1-2    22
5-6    14
9-11   8

Because it basically grouped the numbers that are 1 away from each other into ranges. What's the most efficient way I can convert my list to look like the output?

Ken White
  • 123,280
  • 14
  • 225
  • 444
Joe
  • 429
  • 1
  • 5
  • 9
  • When you say `group the list into ranges based on that number` do you mean you want to specify the size of each grouping? – Corey Apr 11 '21 at 23:50
  • Not really the size as you can see from the example above 9-11 is 3 numbers but the number is 1. They are grouped because they are 1 away from each other. Hope that makes sense. – Joe Apr 11 '21 at 23:53
  • You want to partition the input based on the difference between `Number` values in successive rows, then sum those partitions? With an input that gives the largest allowed gap? – Corey Apr 11 '21 at 23:57
  • Yes Corey exactly – Joe Apr 11 '21 at 23:57
  • What happens if you have a bunch of consecutive numbers `13 14 15 16 17` how do you group them? – Charlieface Apr 12 '21 at 00:17
  • Does this answer your question? [LINQ query — Data aggregation (Group Adjacent)](https://stackoverflow.com/questions/14879197/linq-query-data-aggregation-group-adjacent) – Self Apr 12 '21 at 09:42

1 Answers1

1

There are a couple of approaches to this. Either you can partition the data and then sum on the partitions, or you can roll the whole thing into a single method.

Since partitioning is based on the gaps between the Number values you won't be able to work on unordered lists. Building the partition list on the fly isn't going to work if the list isn't ordered, so make sure you sort the list on the partition field before you start.

Partitioning

Once the lists is ordered (or if it was pre-ordered) you can partition. I use this kind of extension method fairly often for breaking up ordered sequences into useful blocks, like when I need to grab sequences of entries from a log file.

public static partial class Ext
{
    public static IEnumerable<T[]> PartitionStream<T>(this IEnumerable<T> source, Func<T, T, bool> partitioner)
    {
        var partition = new List<T>();
        T prev = default;
        foreach (var next in source)
        {
            if (partition.Count > 0 && !partitioner(prev, next))
            {
                new { p = partition.ToArray(), prev, next }.Dump();
                yield return partition.ToArray();
                partition.Clear();
            }
            partition.Add(prev = next);
        }
        if (partition.Count > 0)
            yield return partition.ToArray();
    }
}

The partitioner parameter compares two objects and returns true if they belong in the same partition. The extension method just collects all the members of the partition together and returns them as an array once it finds something for the next partition.

From there you can just do simple summing on the partition arrays:

var source = new (int n, int v)[] { (1,10),(2,12),(5,5),(6,9),(9,4),(10,3),(11,1) };

var maxDifference = 2;
var aggregate = 
    from part in source.PartitionStream((l, r) => (r.n - l.n) <= maxDifference)
    let low = grp.Min(g => g.n)
    let high = grp.Max(g => g.n)
    select new { Ranges = $"{low}-{high}", Total = grp.Sum(g => g.v) };

This gives the same output as your example.

Stream Aggregation

The second option is both simpler and more efficient since it does barely any memory allocations. The downside - if you can call it that - is that it's a lot less generic.

Rather than partitioning and aggregating over the partitions, this just walks through the list and aggregates as it goes, spitting out results when the partitioning criteria is reached:

IEnumerable<(string Ranges, int Total)> GroupSum(IEnumerable<(int n, int v)> source, int maxDistance)
{
    int low = int.MaxValue;
    int high = 0;
    int total = 0;
    foreach (var (n, v) in source)
    {
        // check partition boundary
        if (n < low || (n - high) > maxDistance)
        {
            if (n > low)
                yield return ($"{low}-{high}", total);
            low = high = n;
            total = v;
        }
        else
        {
            high = n;
            total += v;
        }
    }
    if (total > 0)
        yield return ($"{low}-{high}", total);
}

(Using ValueTuple so I don't have to declare types.)

Output is the same here, but with a lot less going on in the background to slow it down. No allocated arrays, etc.

Corey
  • 15,524
  • 2
  • 35
  • 68