I have a collection of objects containing transactional data. This dataset has a 3 level parent-child relationship with the levels being: Project > Stage > Transaction. For each project and stage there will be multiple transactions.
class CostElement
{
string ProjectNumber { get; set; }
string Stage { get; set; }
string DerivedCostElement { get; set; }
DateTime? Date { get; set; }
decimal? Amount { get; set; }
}
What I want to find is a unique list of cost elements (DerivedCostElement
), with a count of the maximum number of times it is used in any given single project. I.e. Count the number of stages it appears in for each project and then select the maximum value.
What I have tried so far is:
//count all cost elements per project
var aggregateQuery = from ce in _costElements
group ce by new { ce.ProjectNumber, ce.Stage, ce.DerivedCostElement }
into g
select new
{
g.Key.DerivedCostElement,
g.Key.ProjectNumber,
g.Key.Stage
};
//select cost elements having max count
var countQuery = from g in aggregateQuery
group g by new { g.DerivedCostElement, g.ProjectNumber }
into grp
select new CostElementCount
{
CostElement = grp.Key.DerivedCostElement,
ProjectNumber = grp.Key.ProjectNumber,
Count = grp.Count()
};
return countQuery.ToList();
This for the most part works, I end up with a list of cost elements per project and the maximum number of occurrences per project. However the results are still per project, what I want is a unique list with only the maximum value.
Can someone please help me obtain the correct result? Also any suggestions on how to write this more efficiently or succinctly would be greatly appreciated.