17

Possible Duplicate:
Test whether two IEnumerable<T> have the same values with the same frequencies

I wrote

UPDATED - correction:

static bool HaveSameItems<T>(this IEnumerable<T> self, IEnumerable<T> other)
{
    return ! 
    ( 
        other.Except(this).Any() ||
        this.Except(other).Any()
    );
}

Isn't there a shorter way? I know there is SequenceEqual but the order doesn't matter for me.

Community
  • 1
  • 1
Jader Dias
  • 88,211
  • 155
  • 421
  • 625
  • 7
    Note that there is a bug in your own code: you need to use `Except` in both directions, as you actually want to check that the [exclusive disjunction](http://en.wikipedia.org/wiki/Exclusive_disjunction) is empty. – Wim Coenen Feb 03 '11 at 13:44
  • 1
    This has a bug. It returns true for `{1, 1, 2}` and `{1, 2, 2}`. – jason Feb 03 '11 at 15:21
  • @Jason true, but the solutions below will be used instead. – Jader Dias Feb 03 '11 at 15:30
  • @Gabe: that question is similar but it is about comparing [multisets](http://en.wikipedia.org/wiki/Multiset) where one wants to check that each item occurs the same number of times in both enumerables. That's not the case for this question, though the answer for multi-sets could still be applicable here if you first do `.Distinct()` on your enumerables. – Wim Coenen Feb 03 '11 at 15:53
  • @Wim: This question is ambiguous. While it implies that the items don't have to appear the same number of times, the currently chosen answer (http://stackoverflow.com/questions/4886830/what-is-the-shortest-way-to-compare-if-two-ienumerablet-have-the-same-items-in/4887041#4887041) requires it, making me think that it is actually one of this OP's requirements. – Gabe Feb 03 '11 at 16:31
  • @Gabe: I think considering @Jader's agreement with @Jason's comment, it must be a requirement, otherwise the "bug" Jason spotted wouldn't be a bug. I agree that this isn't clear from the question itself, though (until reading those comments I would've assumed it was *not* a requirement). – Dan Tao Feb 03 '11 at 16:46
  • @Dan: So you agree it is a duplicate? – Gabe Feb 03 '11 at 17:38
  • @Gabe: I guess I do, now that you ask! – Dan Tao Feb 03 '11 at 18:04
  • In your code I think you should replace "this" with "self" in the return statement. – kaptan Oct 10 '14 at 01:16

2 Answers2

6

Even if the order doesn't matter to you, it doesn't rule out SequenceEqual as a viable option.

var lst1 = new [] { 2,2,2,2 };
var lst2 = new [] { 2,3,4,5 };
var lst3 = new [] { 5,4,3,2 };

//your current function which will return true
//when you compare lst1 and lst2, even though
//lst1 is just a subset of lst2 and is not actually equal
//as mentioned by Wim Coenen
(lst1.Count() == lst2.Count() &&
        !lst1.Except(lst2).Any()); //incorrectly returns true

//this also only checks to see if one list is a subset of another
//also mentioned by Wim Coenen
lst1.Intersect(lst2).Any(); //incorrectly returns true

//So even if order doesn't matter, you can make it matter just for
//the equality check like so:
lst1.OrderBy(x => x).SequenceEqual(lst2.OrderBy(x => x)); //correctly returns false
lst3.OrderBy(x => x).SequenceEqual(lst2.OrderBy(x => x)); // correctly returns true
diceguyd30
  • 2,742
  • 20
  • 18
  • 2
    This is `O(n log n)`. An `O(n)` solution exists. – jason Feb 03 '11 at 14:11
  • See http://stackoverflow.com/questions/4576723/c-and-linq-want-1-1-2-3-1-2-3-1-returns-true-but-1-1-2-3-1-2-3-re/4576854#4576854 for an O(n) LINQ-based solution. – Gabe Feb 03 '11 at 14:28
  • I corrected the sample in the question – Jader Dias Feb 03 '11 at 15:01
  • Not to pick on you, but as this question is very much on my mind I thought of another reason to want to avoid any solution that uses `OrderBy`. Namely, not all sequences have a meaningful total ordering. – jason Feb 03 '11 at 15:35
  • ^_^ no problem! For those cases you can always use the overload of OrderBy that accepts a custom IComparer, but I see your point. The above answer was thrown together pretty quickly since this question had received several answers before mine, none of which actually worked. This was just the first thing that came to mind. – diceguyd30 Feb 03 '11 at 16:09
  • @diceguyd30: Oh, your solution is nice because it works, and is easily understandable. I'm just pointing put some considerations, that is all. Anyway, my point is that not all sequences `IEnumerable` are going to be defined over types `T` for which there is a natural `IComparer`. Sometimes we can only define a meaningful partial order. – jason Feb 03 '11 at 19:20
  • @Jason Indeed, you are correct. – diceguyd30 Feb 03 '11 at 19:21
5

Here's an O(n) solution that only walks each sequence once (in fact, it might not even completely walk the second sequence, it has early termination possibilities):

public static bool HaveSameItems<T>(IEnumerable<T> a, IEnumerable<T> b) {
    var dictionary = a.GroupBy(x => x).ToDictionary(g => g.Key, g => g.Count());
    foreach(var item in b) {
        int value;
        if (!dictionary.TryGetValue(item, out value)) {
            return false;
        }
        if (value == 0) {
            return false;
        }
        dictionary[item] -= 1;
    }
    return dictionary.All(x => x.Value == 0);
}

One downside to this solution is that it's not going to interop with LINQ to SQL, EF, NHiberate etc. nicely.

jason
  • 236,483
  • 35
  • 423
  • 525
  • I would wrap that `e` in a `using` statement or just go with `foreach` (any reason for the `while e.MoveNext()` approach?). – Dan Tao Feb 03 '11 at 14:16
  • @Dan Tao: Good point. No reason, it's just what came to me naturally. – jason Feb 03 '11 at 14:18
  • @Jason: It's funny, +1 because this is clearly O(n) as you've said; and yet I keep staring at it and thinking "But there *must* be a better way..." I don't even know why. – Dan Tao Feb 03 '11 at 14:23
  • Is this any better than http://stackoverflow.com/questions/4576723/c-and-linq-want-1-1-2-3-1-2-3-1-returns-true-but-1-1-2-3-1-2-3-re/4576732#4576732? – Gabe Feb 03 '11 at 14:23
  • @Dan Tao: I know exactly what you mean. I stared at my solution for a long time before posting to see if I could thinking of something better. That's not to say there isn't, but I'm not coming up with something right now. It will definitely be on my mind all day. – jason Feb 03 '11 at 14:24
  • 1
    @Gabe: The solution you link to sorts which is `O(n log n)`. The accepted answer in that thread is effectively identical to mine except that I have early termination possibilities that can avoid completely walking the second sequence. – jason Feb 03 '11 at 14:26
  • @Gabe: It seems practically the same, with the exception of using removal from the dictionary instead of checking for a count of 0, right? – Dan Tao Feb 03 '11 at 14:28
  • @Jason: Gabe specifically linked to cdhowie's answer (note his second suggestion), which looks to me almost identical to yours, including the early termination. It only removes from the dictionary on reaching a count of zero and then checks the *dictionary's* count at the end. – Dan Tao Feb 03 '11 at 14:30
  • Jason: Search the page for `UnsortedSequencesEqual` to see what I was talking about. – Gabe Feb 03 '11 at 14:30
  • 1
    @Dan Tao, Gabe: Ah, I see. Yeah, it's effectively the same solution. – jason Feb 03 '11 at 14:32
  • I'd expect this answer and cdhowie's to have very similar performance characteristics. cdhowie has the edge when building the dictionary by doing it manually rather than using `GroupBy`/`ToDictionary`/`Count` -- still O(n) but *slightly* smaller constant factor -- but Jason has the edge when testing values by breaking out early when the count reaches zero rather than just calling `Remove` and continuing. – LukeH Feb 03 '11 at 14:34
  • @LukeH: But cdhowie's solution still effectively breaks out at the same time by returning `false` on the next `TryGetValue` that returns `false` (which would happen after the `Remove` call). – Dan Tao Feb 03 '11 at 14:51
  • @Dan: True, I missed that completely. Although I'm not sure whether the `Remove` call itself is expensive enough to have any performance impact (I suspect not). – LukeH Feb 03 '11 at 15:11