14

I thought that I understood Intersect, but it turns out I was wrong.

 List<int> list1 = new List<int>() { 1, 2, 3, 2, 3};
 List<int> list2 = new List<int>() { 2, 3, 4, 3, 4};

 list1.Intersect(list2) =>      2,3

 //But what I want is:
 // =>  2,3,2,3,2,3,3

I can figure a way like:

 var intersected = list1.Intersect(list2);
 var list3 = new List<int>();
 list3.AddRange(list1.Where(I => intersected.Contains(I)));
 list3.AddRange(list2.Where(I => intersected.Contains(I)));

Is there a easier way in LINQ to achieve this?

I do need to state that I do not care in which order the results are given.

2,2,2,3,3,3,3 would also be perfectly OK.

Problem is that I am using this on a very large collection, So I need efficiency.

We are talking about Objects, not ints. The ints were just for the easy example, but I realize this can make a difference.

BartoszKP
  • 34,786
  • 15
  • 102
  • 130
Peterdk
  • 15,625
  • 20
  • 101
  • 140
  • Given your updates, there may be even more efficient ways to solve your problem. Tell us more about the data. Specifically, I am interested in the question of whether your very large collection has mostly unique elements, or mostly duplicates. I am also interested to know if the elements really are integers, or if this is a stand-in for some more complex type; specifically, is there a *total ordering* defined on your data? That is, given a set of this data, is there a unique, well-defined smallest-to-biggest ordering? – Eric Lippert Feb 02 '10 at 18:06

4 Answers4

20

Let's see if we can precisely characterize what you want. Correct me if I am wrong. You want: all elements of list 1, in order, that also appear in list 2, followed by all elements of list 2, in order, that also appear in list 1. Yes?

Seems straightforward.

return list1.Where(x=>list2.Contains(x))
     .Concat(list2.Where(y=>list1.Contains(y)))
     .ToList();

Note that this is not efficient for large lists. If the lists have a thousand items each then this does a couple million comparisons. If you're in that situation then you want to use a more efficient data structure for testing membership:

list1set = new HashSet(list1);
list2set = new HashSet(list2);

return list1.Where(x=>list2set.Contains(x))
     .Concat(list2.Where(y=>list1set.Contains(y)))
     .ToList();

which only does a couple thousand comparisons, but potentially uses more memory.

AGuyCalledGerald
  • 7,882
  • 17
  • 73
  • 120
Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • 5
    Your LINQ queries do not give the same results as your other two queries - if element e occurs n times in list1 and m in list2, they contain it n*m times, which isn't the desired behavior. – kvb Feb 01 '10 at 21:40
  • 2
    *Excellent catch* @kvb. I totally missed that because in the given example, they happen to look confusingly similar. I'll remove the incorrect code. Thanks! – Eric Lippert Feb 01 '10 at 21:44
  • Interesting about the HashSet. I didn't know that it was more efficient. Will look into it! – Peterdk Feb 02 '10 at 17:24
  • @Peterdk: Lists are O(n) to test for membership of an element, but in exchange give you the ability to (1) maintain an ordering, and (2) have duplicates. HashSets are O(1) to test for membership of an element but do not keep the elements in order and never contain duplicates. If you're willing to double your memory and use both a HashSet *and* a List, you can get the best of both worlds. – Eric Lippert Feb 02 '10 at 18:01
1
var set = new HashSet(list1.Intersect(list2));
return list1.Concat(list2).Where(i=>set.Contains(i));
George Polevoy
  • 7,450
  • 3
  • 36
  • 61
0

Maybe this could help: https://gist.github.com/mladenb/b76bcbc4063f138289243fb06d099dda

The original Except/Intersect return a collection of unique items, even though their contract doesn't state so (e.g. the return value of those methods isn't a HashSet/Set, but rather IEnumerable), which is probably a result of a poor design decision. Instead, we can use more intuitive implementation, which returns as much of the same elements from the first enumeration as there are, not just a unique one (using Set.Contains).

Further more, mapping function was added in order to help intersect/except collections of different types.

If you don't need to intersect/except collections of different types, just inspect the source code of the Intersect/Except and change the part which iterates through the first enumeration to use Set.Contains instead of Set.Add/Set.Remove.

Community
  • 1
  • 1
Mladen B.
  • 2,784
  • 2
  • 23
  • 34
-1

I don't believe this is possible with the built-in APIs. But you could use the following to get the result you're looking for.

IEnumerable<T> Intersect2<T>(this IEnumerable<T> left, IEnumerable<T> right) {
  var map = left.ToDictionary(x => x, y => false);
  foreach ( var item in right ) {
    if (map.ContainsKey(item) ) {
      map[item] = true;
    }
  }
  foreach ( var cur in left.Concat(right) ) {
    if ( map.ContainsKey(cur) ) {
      yield return cur;
    }
  }
}
JaredPar
  • 733,204
  • 149
  • 1,241
  • 1,454