5

I am trying to use LINQ to return the an element which occurs maximum number of times AND the number of times it occurs.

For example: I have an array of strings:

string[] words = { "cherry", "apple", "blueberry", "cherry", "cherry", "blueberry" };

//...
Some LINQ statement here
//...

In this array, the query would return cherry as the maximum occurred element, and 3 as the number of times it occurred. I would also be willing to split them into two queries if that is necessary (i.e., first query to get the cherry, and second to return the count of 3.

Chris Pfohl
  • 18,220
  • 9
  • 68
  • 111
Brett
  • 11,637
  • 34
  • 127
  • 213

8 Answers8

12

The solutions presented so far are O(n log n). Here's an O(n) solution:

var max = words.GroupBy(w => w)
               .Select(g => new { Word = g.Key, Count = g.Count() })
               .MaxBy(g => g.Count);
Console.WriteLine(
    "The most frequent word is {0}, and its frequency is {1}.",
    max.Word,
    max.Count
);

This needs a definition of MaxBy. Here is one:

public static TSource MaxBy<TSource>(
    this IEnumerable<TSource> source,
    Func<TSource, IComparable> projectionToComparable
) {
    using (var e = source.GetEnumerator()) {
        if (!e.MoveNext()) {
            throw new InvalidOperationException("Sequence is empty.");
        }
        TSource max = e.Current;
        IComparable maxProjection = projectionToComparable(e.Current);
        while (e.MoveNext()) {
            IComparable currentProjection = projectionToComparable(e.Current);
            if (currentProjection.CompareTo(maxProjection) > 0) {
                max = e.Current;
                maxProjection = currentProjection;
            }
        }
        return max;                
    }
}
jason
  • 236,483
  • 35
  • 423
  • 525
  • 2
    You can also substitute `Aggregate` for `MaxBy` if necessary, although `MaxBy` is *much* nicer: `var max = words.GroupBy(x => x, (k, g) => new { Word = k, Count = g.Count() }).Aggregate((a, x) => (x.Count > a.Count) ? x : a);` – LukeH Feb 03 '11 at 17:14
  • Guys! GroupBy isn't linear. There is small penalty for hashing collisions. I am making this up but it's something like O(n * log log n) – Jakub Šturc Feb 03 '11 at 17:14
  • 1
    @Jakub Šturc: And sort can degrade to `O(n^2)` in the worst case depending on the algorithm used. But on average, it's `O(n log n)`. `GroupBy` is usually going to be `O(n)`. – jason Feb 03 '11 at 18:48
8
var topWordGroup = words.GroupBy(word => word).OrderByDescending(group => group.Count()).FirstOrDefault();
// topWordGroup might be a null!
string topWord = topWordGroup.Key;
int topWordCount = topWordGroup.Count;

And in case if we don't like O(N log N):

var topWordGroup = words.GroupBy(word => word).Aggregate((current, acc) => current.Count() < acc.Count() ? acc : current);
Snowbear
  • 16,924
  • 3
  • 43
  • 67
  • 2
    This is `O(n log n)` when `O(n)` is possible: http://stackoverflow.com/questions/4888537/simple-linq-question-in-c/4888703#4888703. – jason Feb 03 '11 at 16:24
4

First thing that comes to mind (meaning there is probably a more efficient way)

var item = words.GroupBy(x => x).OrderByDescending(x => x.Count()).First()
//item.Key is "cherry", item.Count() is 3

EDIT: forgot op wanted the name and the count

diceguyd30
  • 2,742
  • 20
  • 18
  • 1
    This is `O(n log n) when `O(n)` is possible: http://stackoverflow.com/questions/4888537/simple-linq-question-in-c/4888703#4888703. – jason Feb 03 '11 at 16:26
  • @Jason Ha! We meet again! And once again you are correct. I had overlooked using Jon Skeet's MoreLinq – diceguyd30 Feb 03 '11 at 16:32
  • What's Jon's MoreLinq? I've seen his MiscUtil, but never MoreLinq. – jason Feb 03 '11 at 16:36
  • http://code.google.com/p/morelinq/ And for the list of the added methods: http://code.google.com/p/morelinq/wiki/OperatorsOverview MaxBy is one of them. – diceguyd30 Feb 03 '11 at 16:39
  • 1
    Heh, just read the source for his `MaxBy`. I feel pretty good in coming up with effectively the same implementation (and same name!) as Jon. I think I'm calling it quits for the day; nowhere to go but down from here. Thanks for the linq (ha, puns are not funny). – jason Feb 03 '11 at 16:45
1
string[] words = { "cherry", "apple", "blueberry", "cherry", "cherry", "blueberry" };

var topWordAndCount=words
    .GroupBy(w=>w)
    .OrderByDescending(g=>g.Count())
    .Select(g=>new {Word=g.Key,Count=g.Count()})
    .FirstOrDefault();

//if(topWordAndCount!=null)
//{
//    topWordAndCount.Word
//    topWordAndCount.Count
spender
  • 117,338
  • 33
  • 229
  • 351
  • 1
    This is `O(n log n)` when `O(n)` is possible: http://stackoverflow.com/questions/4888537/simple-linq-question-in-c/4888703#4888703. – jason Feb 03 '11 at 16:26
0

Try this one:

Converting SQL containing top, count, group and order to LINQ (2 Entities)

Community
  • 1
  • 1
bigtlb
  • 1,512
  • 10
  • 16
0
string[] words = { "cherry", "apple", "blueberry", "cherry", "cherry", "blueberry" };

var r = words.GroupBy (x => x)
             .OrderByDescending (g => g.Count ())
             .FirstOrDefault ();
Console.WriteLine (String.Format ("The element {0} occurs {1} times.", r.Key, r.Count ()));
Frederik Gheysels
  • 56,135
  • 11
  • 101
  • 154
0

A simpler O(n) solution:

var groups = words.GroupBy(x => x);
var max = groups.Max(x => x.Count());
var top = groups.First(y => y.Count() == max).Key;
Eric Mickelsen
  • 10,309
  • 2
  • 30
  • 41
0

Here's a very fast O(n) solution in one line(!):

s.GroupBy(x => x).Aggregate((IGrouping<string,string>)null, (x, y) =>  (x != null && y != null && x.Count() >= y.Count()) || y == null ? x : y, x => x);

Or this:

s.GroupBy(x => x).Select(x => new { Key = x.Key, Count = x.Count() }).Aggregate(new { Key = "", Count = 0 }, (x, y) => x.Count >= y.Count ? x : y, x => x);
Eric Mickelsen
  • 10,309
  • 2
  • 30
  • 41