8

What's the most efficient way to write a method that will compare n lists and return all the values that do not appear in all lists, so that

var lists = new List<List<int>> {
                                  new List<int> { 1, 2, 3, 4 },
                                  new List<int> { 2, 3, 4, 5, 8 },
                                  new List<int> { 2, 3, 4, 5, 9, 9 },
                                  new List<int> { 2, 3, 3, 4, 9, 10 }
                                };


public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
  //...fast algorithm here
}

so that

lists.GetNonShared();

returns 1, 5, 8, 9, 10

I had

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
  return list.SelectMany(item => item)
             .Except(lists.Aggregate((a, b) => a.Intersect(b));
}

But I wasn't sure if that was efficient. Order does not matter. Thanks!

Brad Urani
  • 1,429
  • 1
  • 16
  • 29
  • 1
    You're not sure if it's "efficient"? That's not the issue. The issue is: are the semantics correct, and does it meet your performance requirements? The semantics of your implementation are correct. Only you can know if it meets your performance requirements. – jason Sep 15 '11 at 19:34

4 Answers4

5
        public static IEnumerable<T> GetNonShared<T>(this IEnumerable<IEnumerable<T>> list)
        {
           return list.SelectMany(x => x.Distinct()).GroupBy(x => x).Where(g => g.Count() < list.Count()).Select(group => group.Key);
        }
mironych
  • 2,938
  • 2
  • 28
  • 37
2

EDIT: I think I'd think of it like this...

You want the union of all the lists, minus the intersection of all the lists. That's effectively what your original does, leaving Except to do the "set" operation of Union despite getting duplicate inputs. In this case I suspect you could do this more efficiently just building up two HashSets and doing all the work in-place:

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{        
    using (var iterator = lists.GetEnumerator())
    {
        if (!iterator.MoveNext())
        {
            return new T[0]; // Empty
        }

        HashSet<T> union = new HashSet<T>(iterator.Current.ToList());
        HashSet<T> intersection = new HashSet<T>(union);
        while (iterator.MoveNext())
        {
            // This avoids iterating over it twice; it may not be necessary,
            // it depends on how you use it.
            List<T> list = iterator.Current.Toist();
            union.UnionWith(list);
            intersection = intersection.IntersectWith(list);
        }
        union.ExceptWith(intersection);
        return union;
    }
}

Note that this is now eager, not deferred.


Here's an alternative option:

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
    return list.SelectMany(list => list)
               .GroupBy(x => x)
               .Where(group => group.Count() < lists.Count)
               .Select(group => group.Key);
}

If it's possible for a list to contain the same item more than once, you'd want a Distinct call in there:

public IEnumerable<T> GetNonShared(this IEnumerable<IEnumerable<T>> lists)
{
    return list.SelectMany(list => list.Distinct())
               .GroupBy(x => x)
               .Where(group => group.Count() < list.Count)
               .Select(group => group.Key);
}

EDIT: Now I've corrected this, I understand your original code... and I suspect I can find something better... thinking...

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 1
    This excludes 5 & 9. He only wants values common to all lists excluded. – Austin Salonen Sep 15 '11 at 19:19
  • He flattens all of the lists, and then removes the items that are in all of the lists (that's computed by the aggregate operation). – jason Sep 15 '11 at 19:32
  • @Jason: Yes - once I'd reread the question, it made a lot more sense :) I've edited with an "in place" idea now. – Jon Skeet Sep 15 '11 at 19:37
0
public static IEnumerable<T> GetNonShared<T>(this IEnumerable<IEnumerable<T>> list)
{
    var lstCnt=list.Count(); //get the total number if items in the list                                
    return list.SelectMany (l => l.Distinct())
        .GroupBy (l => l)
        .Select (l => new{n=l.Key, c=l.Count()})
        .Where (l => l.c<lstCnt)
        .Select (l => l.n)
        .OrderBy (l => l) //can be commented
        ;
}

//use HashSet and SymmetricExceptWith for .net >= 4.5

SKG
  • 152
  • 5
  • that's more useful to explain about your answer. – Nima Derakhshanjan Jun 20 '15 at 13:10
  • 1
    it basically gets all distinct items from each individual list into a flattened out (selectMany) list then does a grouping of each item by its value to get how many times (l.c) it (l.n) occurs (in the flattened list). If l.c (for any item) is less than the total number of individual list (lstCnt) then we can say for sure that the item did not exist in atleast one list. – SKG Jun 21 '15 at 14:21
0

I think you need to create an intermediate step, which is finding all the items which are common to all lists. This is easy to do with set logic - it's just the set of items in the first list intersected with the set of items in each succeeding list. I don't think that step's doable in LINQ, though.

class Program
{
    static void Main(string[] args)
    {
        IEnumerable<IEnumerable<int>> lists = new List<IEnumerable<int>> {
                              new List<int> { 1, 2, 3, 4 },
                              new List<int> { 2, 3, 4, 5, 8 },
                              new List<int> { 2, 3, 4, 5, 9, 9 },
                              new List<int> { 2, 3, 3, 4, 9, 10 }
                            };

        Console.WriteLine(string.Join(", ", GetNonShared(lists)
            .Distinct()
            .OrderBy(x => x)
            .Select(x => x.ToString())
            .ToArray()));
        Console.ReadKey();
    }

    public static HashSet<T> GetShared<T>(IEnumerable<IEnumerable<T>> lists)
    {
        HashSet<T> result = null;
        foreach (IEnumerable<T> list in lists)
        {
            result = (result == null)
                         ? new HashSet<T>(list)
                         : new HashSet<T>(result.Intersect(list));
        }
        return result;
    }

    public static IEnumerable<T> GetNonShared<T>(IEnumerable<IEnumerable<T>> lists)
    {
        HashSet<T> shared = GetShared(lists);
        return lists.SelectMany(x => x).Where(x => !shared.Contains(x));
    }
}
Robert Rossney
  • 94,622
  • 24
  • 146
  • 218