Group by array contents

Question

I have a List<Tuple<string,long,byte[]>> and I want to group by the contents of the byte array.

Is there a simple way to do this with GroupBy and a lambda?

Ideally, I want to do this without creating an intermediate data structure (like a string to hold the elements of the array).

What are you hoping to end up with? If you get the individual byte items as the key, what would the rest of your result be? — itsme86, Apr 05 '13 at 18:41
I'm hoping to end up with the items in my list grouped by the contents of that array. I.e. if the arrays are equal, then they are in the same group, otherwise they are in different groups. — soandos, Apr 05 '13 at 18:42
Are you defining equality for the arrays as being a reference to the same array, or having the same bytes in different arrays? If the latter, you need to define a custom equality comparer for `byte[]`. — Servy, Apr 05 '13 at 18:43
@soandos You can't. You need to create a new class that implements `IEqualityComparer`, implement both methods, create an instance of it, and pass that to `GroupBy`. — Servy, Apr 05 '13 at 18:44
@soandos You could use `IEnumerable.SequenceEqual()` (http://msdn.microsoft.com/en-us/library/bb348567.aspx) — itsme86, Apr 05 '13 at 18:44
@itsme86 how can I use that in the context of the lambda though? I only have one argument. Servy, the hashcode for arrays is just as good as the reference equality compare. — soandos, Apr 05 '13 at 18:46

score 4 · Accepted Answer · answered Apr 05 '13 at 18:47

4

You can achieve that using custom IEqualityComparer<byte[]> (or even better, generic one: IEqualityComparer<T[]>) implementation:

class ArrayComparer<T> : IEqualityComparer<T[]>
{
    public bool Equals(T[] x, T[] y)
    {
        return x.SequenceEqual(y);
    }

    public int GetHashCode(T[] obj)
    {
        return obj.Aggregate(string.Empty, (s, i) => s + i.GetHashCode(), s => s.GetHashCode());
    }
}

I'm pretty sure GetHashCode could be implemented much better, but it's just an example!

Usage:

var grouped = source.GroupBy(i => i.Item3, new ArrayComparer<byte>())

answered Apr 05 '13 at 18:47

MarcinJuraszek

124,003
15
196
263

That's not a very efficient hash code generation method, but it will work. – Servy Apr 05 '13 at 18:48
As I said - it's just an example. – MarcinJuraszek Apr 05 '13 at 18:48
Well odds are the OP isn't going to know that your GetHashCode method isn't very good, or how to fix it; he's just going to test it once or twice, see the right output, and never touch it again. – Servy Apr 05 '13 at 18:49
What is the GetHashCode method used for (as opposed to equals)? – soandos Apr 05 '13 at 18:50
@soandos It's well described on MSDN: [`Object.GetHashCode` Method](http://msdn.microsoft.com/en-us/library/system.object.gethashcode.aspx) – MarcinJuraszek Apr 05 '13 at 18:51
How is it handled if the hashcodes are equal, but the objects are not? (I will be using this in a case where there are at least 600,000 objects or so) – soandos Apr 05 '13 at 18:53
@soandos It uses `Equals` to determine if they are *actually* equal when the hashes collide. – Servy Apr 05 '13 at 18:53
`Equals` method is used then. – MarcinJuraszek Apr 05 '13 at 18:53
So in essence, the hash code does not have to be very good if the arrays are short (the time it takes to create the strings would be much higher if the array is only of size 4 or so)? – soandos Apr 05 '13 at 18:55
1

@soandos The hash code never *has* to be good. How good it is determines how efficient the operation is. In this case, the hash code is time and memory consuming to create, and has a much higher rate of collisions than it could have. These are all bad properties for a hash code. None of them will result in incorrect output, just slow execution as the size of the data as well as the number of items increases. – Servy Apr 05 '13 at 18:58
@Servy make sense. I think I'm going to profile the results of this hashing function with just `return obj[0]` since in my case those are random, and the equals check is short. – soandos Apr 05 '13 at 19:01
@soandos It could be a good idea. You're gonna iterate within `Equals` method then (for same first element), pretty the same time consumable as calculating `HashCode` from whole array content. – MarcinJuraszek Apr 05 '13 at 19:06
1

@Servy Can you propose a better `GetHashCode` implementation? – julealgon Feb 27 '15 at 12:50
@julealgon It's a pretty well solved problem; a bit of basic research on the subject will result in a number of simple well suited algorithms for different situations. – Servy Feb 27 '15 at 14:48
@Servy But it would've saved us all time and helped complete this answer to just suggest what you feel is the best solution for this situation in the first place. – xr280xr Apr 24 '21 at 04:13

Group by array contents

1 Answers1

Linked

Related