How to construct a having statement in linq with a max value?

Question

I have a collection of objects containing transactional data. This dataset has a 3 level parent-child relationship with the levels being: Project > Stage > Transaction. For each project and stage there will be multiple transactions.

class CostElement
{
    string ProjectNumber { get; set; }
    string Stage { get; set; }
    string DerivedCostElement { get; set; }
    DateTime? Date { get; set; }
    decimal? Amount { get; set; }
}

What I want to find is a unique list of cost elements (DerivedCostElement), with a count of the maximum number of times it is used in any given single project. I.e. Count the number of stages it appears in for each project and then select the maximum value.

What I have tried so far is:

//count all cost elements per project
var aggregateQuery = from ce in _costElements
                     group ce by new { ce.ProjectNumber, ce.Stage, ce.DerivedCostElement }
                     into g
                     select new
                     {
                         g.Key.DerivedCostElement,
                         g.Key.ProjectNumber,
                         g.Key.Stage
                     };

//select cost elements having max count
var countQuery = from g in aggregateQuery
                 group g by new { g.DerivedCostElement, g.ProjectNumber }
                 into grp
                 select new CostElementCount
                 {
                     CostElement = grp.Key.DerivedCostElement,
                     ProjectNumber = grp.Key.ProjectNumber,
                     Count = grp.Count()
                 };

return countQuery.ToList();

This for the most part works, I end up with a list of cost elements per project and the maximum number of occurrences per project. However the results are still per project, what I want is a unique list with only the maximum value.

Can someone please help me obtain the correct result? Also any suggestions on how to write this more efficiently or succinctly would be greatly appreciated.

S. Baggy · Accepted Answer · 2013-11-15T07:58:31.923

I'm not sure about the writing the first two queries you have more efficiently, but I think adding the following line before your return statement would get the results you want.

countQuery = countQuery.OrderByDescending(x => x.Count).GroupBy(x => x.CostElement).Select(g => g.First());

Or to keep the syntax more consistent

countQuery = from cq in countQuery
             orderby cq.Count descending
             group cq by cq.CostElement
                 into grp
             select grp.First();

Here is the test data I used...

_costElements = new List<CostElement>();
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "a", DerivedCostElement = "ce6" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "b", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "x", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "b", DerivedCostElement = "ce2" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "b", DerivedCostElement = "ce3" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "c", DerivedCostElement = "ce3" });
_costElements.Add(new CostElement() { ProjectNumber = "a", Stage = "d", DerivedCostElement = "ce3" });
_costElements.Add(new CostElement() { ProjectNumber = "b", Stage = "e", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "b", Stage = "f", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "c", Stage = "g", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "c", Stage = "h", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "c", Stage = "h", DerivedCostElement = "ce2" });
_costElements.Add(new CostElement() { ProjectNumber = "c", Stage = "i", DerivedCostElement = "ce2" });
_costElements.Add(new CostElement() { ProjectNumber = "d", Stage = "j", DerivedCostElement = "ce2" });
_costElements.Add(new CostElement() { ProjectNumber = "d", Stage = "k", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "d", Stage = "l", DerivedCostElement = "ce1" });
_costElements.Add(new CostElement() { ProjectNumber = "d", Stage = "m", DerivedCostElement = "ce1" });

And the results I came up with (with a bit of reordering of the countQuery results) were...

CostElement: ce1 ProjectNumber: d Count: 3

CostElement: ce2 ProjectNumber: c Count: 2

CostElement: ce3 ProjectNumber: a Count: 3

CostElement: ce6 ProjectNumber: a Count: 1

Which I think is a list of all cost elements, along with the project and count where each of them appears most. That output came from a ToString() I added to CostElementCount

public string ToString()
{
    return string.Format("CostElement: {0} ProjectNumber: {1} Count: {2}", CostElement, ProjectNumber, Count);
}

Thanks for your reply. This is sort of what I want, but perhaps I wasn't quite clear in my question. This returns the cost element with the most usage. What I want is a list of all cost elements, with the max each is used in a project. — codemonkeh, Nov 14 '13 at 22:10
However if i just regroup my query again by cost element (like in your question) I can just do a max there and it works. Although I can't but hope there is a shorter way to do this. — codemonkeh, Nov 14 '13 at 22:11
In my test solution, I came up with a list of all cost elements, along with the name of the project that each was found most in and the count. Perhaps I modelled it wrong. I've added some more info to my answer. — S. Baggy, Nov 15 '13 at 07:48
My apologies when I read the code I assumed `First` would just select the top element, this does indeed work. Thank you! — codemonkeh, Nov 21 '13 at 03:14
Great! Yes, the First() is per group, so it works to meet the requirement. — S. Baggy, Nov 21 '13 at 09:14

How to construct a having statement in linq with a max value?

1 Answers1