1

I have an IEnumerable extension MaxBy to be used like

var longest = new [] {"cat", "dogs", "nit" }.MaxBy(x=>x.Count())

should be

"dogs"

with implementation*emphasized text*

public static T MaxBy<T,M>(this IEnumerable<T> This, Func<T,M> selector)
    where M : IComparable
{
    return This
        .Skip(1)
        .Aggregate
        ( new {t=This.First(), m = selector(This.First())}
        , (a, t) =>
            {
                var m = selector(t);
                if ( m.CompareTo(a.m) > 0)
                {
                    return new { t, m };
                }
                else
                {
                    return a;
                }
            }
        , a => a.t);
}

It's quite elegant and purely functional but I see a problem. I'm using anonymous objects which are reference types and require garbage collection. In the worst case when travesing an IEnumerable of length N I will make N memory allocations and N objects will need garbage collection.

I could write the code to use an external mutable accumulator but aesthetically I'd prefer to stick with the pattern I have.

However are my concerns in reality a problem? Does the .Net generational garbage collector identify that these objects are very short lived, only one at a time and optimize away what is happening? Or is it better that I create a custom value type ( struct ) to hold my accumulator instead of using an anonymous object.

** EDIT **

This is obviously the non functional way to do it.

public static T MaxBy<T,M>(this IEnumerable<T> This, Func<T,M> selector)
    where M : IComparable
{
    var t = This.First();
    var max = selector(t);
    foreach (var item in This.Skip(1))
    {
        var m = selector(item);
        if ( m.CompareTo(max) > 0)
        {
            max = m;
            t = item;
        }

    }
    return t;
}
bradgonesurfing
  • 30,949
  • 17
  • 114
  • 217
  • 2
    I think I might just scrap that code and write a simple loop, myself. – Anthony Pegram Feb 20 '13 at 06:19
  • 2
    Your realize you are enumerating a possibly lazy sequence multiple times? Also, have you seen [this](http://code.google.com/p/morelinq/source/browse/MoreLinq/MaxBy.cs?r=2da75c30d5bda7959e5113b702b4bb204a0e4a2c)? That's not exactly functional, though. – Ani Feb 20 '13 at 06:21
  • 1
    [Eric Lippert's Which is faster?](http://ericlippert.com/2012/12/17/performance-rant/) – horgh Feb 20 '13 at 06:22
  • @AnthonyPegram How is it being enumerating multiple times? – bradgonesurfing Feb 20 '13 at 06:23
  • 1
    Well, I'm not the one to make that claim. But you're invoking First more than you want to as well as enumerating for the aggregation. If getting the enumerator is expensive (database, web service, some otherwise complicated query, etc.), that's some unnecessary work. Enumerating once and holding onto items as you need them might be preferred. *If you're concerned about performance.* – Anthony Pegram Feb 20 '13 at 06:28
  • Since you seem to be going for style points in this code it looks ok. Dropping second component from the pair will make code much nicer, but you'd have to call selector multiple times... On other hand Select+Max will be much easier to read and probably somewhat close in allocation costs. – Alexei Levenkov Feb 20 '13 at 06:28
  • @AnthonyPegram sorry for attributing the wrong claim to you. I agree with regards to First then Skip. I've been tempted to write an extension method that efficiently returns then head and tail in one call. – bradgonesurfing Feb 20 '13 at 06:32
  • But to get back to my point, I find your second version immeasurably more readable and easier to understand. But even then, it could be improved. Ani's link to Jon Skeet's MoreLinq code can be helpful. Plus, you might want to think about introducing some validation. But this isn't code review, so I'll leave it that. :p – Anthony Pegram Feb 20 '13 at 06:36

1 Answers1

2

This one is better:

public static T MaxBy<T, M>(this IEnumerable<T> source, Func<T, M> selector)
  where M : IComparable
{
  return source.Aggregate((record, next) =>
     Comparer<M>.Default.Compare(selector(next), selector(record)) > 0
     ? next
     : record);
}
  • You do not have as much code (smarter use of .Aggregate<>)
  • You avoid the anonymous type
  • With Comparer<M>.Default you avoid boxing in the very, very common scenario where your M is a value type that is actually IComparable<M> (and not just IComparble). In your example where M was int, that applies! And with both value types and reference types you can bypass a type check if M really implements the generic IComparable<M>. But everything will still work if M is a "poor" type with only non-generic IComparable (the contraint where M : IComparable).

The behaviour with InvalidOperationException when the source is empty, is unchanged.

Jeppe Stig Nielsen
  • 60,409
  • 11
  • 110
  • 181
  • 1
    This applies the selector to each item twice (except for the first and last item) rather than just once. If the selector is expensive, or has side effects, or doesn't always produce the same value, then that's a big problem. – Servy May 07 '18 at 15:01
  • One easy way to avoid that, using the new value tuples (C# 7.0), is to first project `source` onto a value type consisting of the entry and its "size", and then using `.Aggregate<>` in the clever way from before. So the entire body of the method becomes: `return source.Select(x => (x, selector(x))).Aggregate((x, y) => Comparer.Default.Compare(y.Item2, x.Item2) > 0 ? y : x).Item1;` – Jeppe Stig Nielsen May 07 '18 at 15:38
  • Which is of course exactly what the OP did in their question. – Servy May 07 '18 at 15:55
  • @Servy Yes, but he was worried about garbage collection of all the objects of anonymous type, and that does not occur with value tuples (however, his question is from 2013, a long time before value tuples). He even considered substituting his anonymous type with a `struct`. So my latest comment is still on-topic, in this sense. Note: I do not know if the asker's worries about the GC overhead are justified in practice. He should measure with his actual application. – Jeppe Stig Nielsen May 07 '18 at 17:26
  • Yes, so again, you're just suggesting doing the thing that they suggested doing in their own question. That's not off topic, it's just redundant, as the question already mentions that approach. – Servy May 07 '18 at 17:29