16

I have some data that has various attributes and I want to hierarchically group that data. For example:

public class Data
{
   public string A { get; set; }
   public string B { get; set; }
   public string C { get; set; }
}

I would want this grouped as:

A1
 - B1
    - C1
    - C2
    - C3
    - ...
 - B2
    - ...
A2
 - B1
    - ...
...

Currently, I have been able to group this using LINQ such that the top group divides the data by A, then each subgroup divides by B, then each B subgroup contains subgroups by C, etc. The LINQ looks like this (assuming an IEnumerable<Data> sequence called data):

var hierarchicalGrouping =
            from x in data
            group x by x.A
                into byA
                let subgroupB = from x in byA
                                group x by x.B
                                    into byB
                                    let subgroupC = from x in byB
                                                    group x by x.C
                                    select new
                                    {
                                        B = byB.Key,
                                        SubgroupC = subgroupC
                                    }
                select new
                {
                    A = byA.Key,
                    SubgroupB = subgroupB
                };

As you can see, this gets somewhat messy the more subgrouping that's required. Is there a nicer way to perform this type of grouping? It seems like there should be and I'm just not seeing it.

Update
So far, I have found that expressing this hierarchical grouping by using the fluent LINQ APIs rather than query language arguably improves readability, but it doesn't feel very DRY.

There were two ways I did this: one using GroupBy with a result selector, the other using GroupBy followed by a Select call. Both could be formatted to be more readable than using query language but don't still don't scale well.

var withResultSelector =
    data.GroupBy(a => a.A, (aKey, aData) =>
        new
        {
            A = aKey,
            SubgroupB = aData.GroupBy(b => b.B, (bKey, bData) =>
                new
                {
                    B = bKey,
                    SubgroupC = bData.GroupBy(c => c.C, (cKey, cData) =>
                    new
                    {
                        C = cKey,
                        SubgroupD = cData.GroupBy(d => d.D)
                    })
                })
        });

var withSelectCall =
    data.GroupBy(a => a.A)
        .Select(aG =>
        new
        {
            A = aG.Key,
            SubgroupB = aG
                .GroupBy(b => b.B)
                .Select(bG =>
            new
            {
                B = bG.Key,
                SubgroupC = bG
                    .GroupBy(c => c.C)
                    .Select(cG =>
                new
                {
                    C = cG.Key,
                    SubgroupD = cG.GroupBy(d => d.D)
                })
            })
        });

What I'd like...
I can envisage a couple of ways that this could be expressed (assuming the language and framework supported it). The first would be a GroupBy extension that takes a series of function pairs for key selection and result selection, Func<TElement, TKey> and Func<TElement, TResult>. Each pair describes the next sub-group. This option falls down because each pair would potentially require TKey and TResult to be different than the others, which would mean GroupBy would need finite parameters and a complex declaration.

The second option would be a SubGroupBy extension method that could be chained to produce sub-groups. SubGroupBy would be the same as GroupBy but the result would be the previous grouping further partitioned. For example:

var groupings = data
    .GroupBy(x=>x.A)
    .SubGroupBy(y=>y.B)
    .SubGroupBy(z=>z.C)

// This version has a custom result type that would be the grouping data.
// The element data at each stage would be the custom data at this point
// as the original data would be lost when projected to the results type.
var groupingsWithCustomResultType = data
    .GroupBy(a=>a.A, x=>new { ... })
    .SubGroupBy(b=>b.B, y=>new { ... })
    .SubGroupBy(c=>c.C, c=>new { ... })

The difficulty with this is how to implement the methods efficiently as with my current understanding, each level would re-create new objects in order to extend the previous objects. The first iteration would create groupings of A, the second would then create objects that have a key of A and groupings of B, the third would redo all that and add the groupings of C. This seems terribly inefficient (though I suspect my current options actually do this anyway). It would be nice if the calls passed around a meta-description of what was required and the instances were only created on the last pass, but that sounds difficult too. Note that his is similar to what can be done with GroupBy but without the nested method calls.

Hopefully all that makes sense. I expect I am chasing rainbows here, but maybe not.

Update - another option
Another possibility that I think is more elegant than my previous suggestions relies on each parent group being just a key and a sequence of child items (as in the examples), much like IGrouping provides now. That means one option for constructing this grouping would be a series of key selectors and a single results selector.

If the keys were all limited to a set type, which is not unreasonable, then this could be generated as a sequence of key selectors and a results selector, or a results selector and a params of key selectors. Of course, if the keys had to be of different types and different levels, this becomes difficult again except for a finite depth of hierarchy due to the way generics parameterization works.

Here are some illustrative examples of what I mean:

For example:

public static /*<grouping type>*/ SubgroupBy(
    IEnumerable<Func<TElement, TKey>> keySelectors,
    this IEnumerable<TElement> sequence,
    Func<TElement, TResult> resultSelector)
{
    ...
}

var hierarchy = data.SubgroupBy(
                    new [] {
                        x => x.A,
                        y => y.B,
                        z => z.C },
                    a => new { /*custom projection here for leaf items*/ })

Or:

public static /*<grouping type>*/ SubgroupBy(
    this IEnumerable<TElement> sequence,
    Func<TElement, TResult> resultSelector,
    params Func<TElement, TKey>[] keySelectors)
{
    ...
}

var hierarchy = data.SubgroupBy(
                    a => new { /*custom projection here for leaf items*/ },
                    x => x.A,
                    y => y.B,
                    z => z.C)

This does not solve implementation inefficiencies, but it should solve the complex nesting. However, what would the return type of this grouping be? Would I need my own interface or can I use IGrouping somehow. How much do I need to define or does the variable depth of the hierarchy still make this impossible?

My guess is that this should be the same as the return type from any IGrouping call but how does the type system infer that type if it isn't involved in any of the parameters that are passed?

This problem is stretching my understanding, which is great, but my brain hurts.

Jeff Yates
  • 61,417
  • 20
  • 137
  • 189
  • @Jeff: Could you post the kind of code you'd *want* to write (presumably invoking some sort of helper) and then we can see what we can do? I suspect it's one of those things which will require a different overload for every level of hierarchy (e.g. one for 2 levels, one for 3 etc) but it could still be useful. – Jon Skeet Feb 10 '10 at 14:16
  • @jon skeet: sure. I'll provide an update shortly. I feel there is a more elegant solution but I can't see it. I made an attempt to spec my call yesterday but it falls foul of generics rules as each use of Func required different generic types. – Jeff Yates Feb 10 '10 at 14:39
  • @Jon Skeet: Right, I've provided some detail on the options I've considered (outside of language or framework restrictions) and my general thinking. – Jeff Yates Feb 10 '10 at 15:57
  • Were you able to find a solution? – Robert Harvey Nov 22 '10 at 23:23
  • @Robert: No, I didn't get one that was satisfactory. It seems this is a pretty hard problem to solve. – Jeff Yates Nov 23 '10 at 13:32
  • I've done some of this kind of work before. If you need custom subgroupings, you need a recursive class definition, like Obalix's GroupResult class below. You can then populate each class instance with whatever grouping you like, one grouping at a time. – Robert Harvey Nov 23 '10 at 15:25
  • @Robert: I was wondering if that is as good as it gets. Thanks. – Jeff Yates Nov 23 '10 at 18:00

3 Answers3

10

Here is a description how you can implement an hierarchical grouping mechanism.

From this description:

Result class:

public class GroupResult
{
    public object Key { get; set; }
    public int Count { get; set; }
    public IEnumerable Items { get; set; }
    public IEnumerable<GroupResult> SubGroups { get; set; }
    public override string ToString() 
    { return string.Format("{0} ({1})", Key, Count); }
}

Extension method:

public static class MyEnumerableExtensions
{
    public static IEnumerable<GroupResult> GroupByMany<TElement>(
        this IEnumerable<TElement> elements,
        params Func<TElement, object>[] groupSelectors)
    {
        if (groupSelectors.Length > 0)
        {
            var selector = groupSelectors.First();

            //reduce the list recursively until zero
            var nextSelectors = groupSelectors.Skip(1).ToArray();
            return
                elements.GroupBy(selector).Select(
                    g => new GroupResult
                    {
                        Key = g.Key,
                        Count = g.Count(),
                        Items = g,
                        SubGroups = g.GroupByMany(nextSelectors)
                    });
        }
        else
            return null;
    }
}

Usage:

var result = customers.GroupByMany(c => c.Country, c => c.City);

Edit:

Here is an improved and properly typed version of the code.

public class GroupResult<TItem>
{
    public object Key { get; set; }
    public int Count { get; set; }
    public IEnumerable<TItem> Items { get; set; }
    public IEnumerable<GroupResult<TItem>> SubGroups { get; set; }
    public override string ToString() 
    { return string.Format("{0} ({1})", Key, Count); }
}

public static class MyEnumerableExtensions
{
    public static IEnumerable<GroupResult<TElement>> GroupByMany<TElement>(
        this IEnumerable<TElement> elements,
        params Func<TElement, object>[] groupSelectors)
    {
        if (groupSelectors.Length > 0)
        {
            var selector = groupSelectors.First();

            //reduce the list recursively until zero
            var nextSelectors = groupSelectors.Skip(1).ToArray();
            return
                elements.GroupBy(selector).Select(
                    g => new GroupResult<TElement> {
                        Key = g.Key,
                        Count = g.Count(),
                        Items = g,
                        SubGroups = g.GroupByMany(nextSelectors)
                    });
        } else {
            return null;
        }
    }
}
TheSoftwareJedi
  • 34,421
  • 21
  • 109
  • 151
AxelEckenberger
  • 16,628
  • 3
  • 48
  • 70
4

You need a recursive function. The recursive function calls itself for each node in the tree.

To do this in Linq, you can use a Y-combinator.

Robert Harvey
  • 178,213
  • 47
  • 333
  • 501
  • How would that work when the property I am grouping by changes at each level? – Jeff Yates Feb 09 '10 at 16:15
  • It doesn't. You're better off setting up a self-referential association by adding a ParentID to each node (so that you're always referring to ParentID at each level), unless of course the number of tree levels (nested depth) is limited by your application's design. – Robert Harvey Feb 09 '10 at 16:20
  • As said, the issue is not identical to recursive expansion of a tree. Also, this is almost a link-only answer, as in, when the link dies it is reduced to a comment. – Gert Arnold Jun 26 '21 at 18:10
0

Here is my attempt to create nested grouping. May be someone find it useful.

// extension method
public static IEnumerable<TResult> GroupMany<TElement, TResult>(this IEnumerable<TElement> seq, Func<GroupingBuilder<TElement>, IGroupingStage<TElement, TResult>> configure)
{
    var builder = new GroupingBuilder<TElement>();
    return configure(builder).ApplyTo(seq);
}

// builder classes

public class GroupingBuilder<TElement>
{
    public GroupingBuilder<TKeyNext, Group<TKeyNext, TElement>, TElement, TElement> By<TKeyNext>(Func<TElement, TKeyNext> keySelector)
        => By(keySelector, (k, s, nested) => Group.Of(k, nested(s)));

    public new GroupingBuilder<TKeyNext, TElementNext, TElement, TElement> By<TKeyNext, TElementNext>(
        Func<TElement, TKeyNext> keySelector,
        Func<TKeyNext, IEnumerable<TElement>, Func<IEnumerable<TElement>, IEnumerable<TElement>>, TElementNext> elementSelector)
        => new GroupingBuilder<TKeyNext, TElementNext, TElement, TElement>(keySelector, elementSelector, new IdentityStage());


    // preventing writing GroupMany(g => g), i.e. mentioned call will not compile
    private class IdentityStage : IGroupingStage<TElement, TElement>
    {
        public IEnumerable<TElement> ApplyTo(IEnumerable<TElement> seq) => seq;
    }
}

public class GroupingBuilder<TKeyCurrent, TElementCurrent, TElementPrev, TElement> : IGroupingStage<TElement, TElementCurrent>
{
    private Func<TElement, TKeyCurrent> _keySelector;
    private IGroupingStage<TElement, TElementPrev> _prevStage;
    private Func<TKeyCurrent, IEnumerable<TElement>, Func<IEnumerable<TElement>, IEnumerable<TElementPrev>>, TElementCurrent> _elementSelector;

    public GroupingBuilder(
        Func<TElement, TKeyCurrent> keySelector,
        Func<TKeyCurrent, IEnumerable<TElement>, Func<IEnumerable<TElement>, IEnumerable<TElementPrev>>, TElementCurrent> elementSelector,
        IGroupingStage<TElement, TElementPrev> prevStage)
    {
        _keySelector = keySelector;
        _prevStage = prevStage;
        _elementSelector = elementSelector;
    }

    public GroupingBuilder<TKeyNext, Group<TKeyNext, TElementCurrent>, TElementCurrent, TElement> By<TKeyNext>(
        Func<TElement, TKeyNext> keySelector)
        => By(keySelector, (k, s, nested) => Group.Of(k, nested(s)));

    public GroupingBuilder<TKeyNext, TElementNext, TElementCurrent, TElement> By<TKeyNext, TElementNext>(
        Func<TElement, TKeyNext> keySelector,
        Func<TKeyNext, IEnumerable<TElement>, Func<IEnumerable<TElement>, IEnumerable<TElementCurrent>>, TElementNext> elementSelector)
        => new GroupingBuilder<TKeyNext, TElementNext, TElementCurrent, TElement>(keySelector, elementSelector, this);

    IEnumerable<TElementCurrent> IGroupingStage<TElement, TElementCurrent>.ApplyTo(IEnumerable<TElement> seq)
        => seq.GroupBy(_keySelector, (k, s) => _elementSelector(k, s, _prevStage.ApplyTo));
}

public interface IGroupingStage<TElement, TResultElement>
{
    IEnumerable<TResultElement> ApplyTo(IEnumerable<TElement> seq);
}

// Group data structure
public class Group<TKey, TElement>
{
    public TKey Key { get; set; }
    public ICollection<TElement> Items { get; set; }
}

public static class Group
{
    public static Group<TKey, TElement> Of<TKey, TElement>(TKey key, IEnumerable<TElement> elements)
        => new Group<TKey, TElement> { Key = key, Items = elements.ToList() };
}

Basic usage:

var items = new[]{
    new SomeEntity{NonUniqueId = 1, Name = "John", Surname = "Doe", DoB = new DateTime(1900, 01, 03)},
    new SomeEntity{NonUniqueId = 1, Name = "John", Surname = "Doe", DoB = new DateTime(1980, 01, 03)},
    new SomeEntity{NonUniqueId = 2, Name = "Jane", Surname = "Doe", DoB = new DateTime(1902, 01, 03)},
    new SomeEntity{NonUniqueId = 1, Name = "Jane", Surname = "Smith", DoB = new DateTime(1999, 01, 03)},
};

IEnumerable<Group<int, Group<DateTime, Group<string, SomeEntity>>>> result = items
    .GroupMany(c => c
        .By(x => x.Surname)
        .By(x => x.DoB)
        .By(x => x.NonUniqueId));

Note that grouped properties must be specified in reverse order. That's caused by restriction of generics - GroupingBuilder<TKeyCurrent, TElementCurrent, TElementPrev, TElement> wraps previous grouping type with new one, so nesting can be done only in reverse order.

Usage with custom result selectors:

var result = items
    .GroupMany(c => c
        .By(x => x.Surname, (key, seq, nested) => new { Surname = key, ChildItems = nested(seq).ToList() })
        .By(x => x.DoB, (key, seq, nested) => new { DoB = key, Children = nested(seq).ToList() })
        .By(x => x.NonUniqueId));