1

I wrote a custom extension that returns the intersection between two collection and both the unique elements in a an b.

public static void CompareTo<T>(
    this IEnumerable<T> sourceA,
    IEnumerable<T> sourceB,
    IEqualityComparer<T> equalityComparer,
    out ICollection<T> commonElements,
    out ICollection<T> elementsOnlyInSourceA,
    out ICollection<T> elementsOnlyInSourceB)
    where T : class
{
    if (sourceA == null)
    {
        throw new ArgumentNullException(nameof(sourceA));
    }

    commonElements = new List<T>();
    elementsOnlyInSourceA = new List<T>();
    elementsOnlyInSourceB = new List<T>();

 
    foreach (var elementA in sourceA)
    {
        var equalElementInSourceB = sourceB.FirstOrDefault(elementB => equalityComparer.Equals(elementB, elementA));
        if (equalElementInSourceB != null)
        {
            commonElements.Add(elementA);
        }
        else
        {
            elementsOnlyInSourceA.Add(elementA);
        }
    }

    elementsOnlyInSourceB = sourceB.Except(commonElements, equalityComparer).ToList();
}

I know I could just go ahead using Intersect and Except but this implementation should be a bit faster because I actually can determine both common and unique elements in sourceA in a single iteration of sourceA. But thats not my point. What kind of smells to me is that the caller cannot know which objects in commonElements are actually beeing returned. Are they coming from sourceA or sourceB? Even in Intersect extension this is not made clear to me. So what is it then? How can I express that - in my case - I would always return common elements of sourceA?

UPDATE: After considering comments/answers I refactored to this:

  public static ICollection<T> Intersect<T>(
            this IEnumerable<T> primarySource,
            IEnumerable<T> secondarySource,
            IEqualityComparer<T> equalityComparer,
            out ICollection<T> uniqueInPrimarySource,
            out ICollection<T> uniqueInSecondarySource)
            where T : class
        {
            if (primarySource == null)
            {
                throw new ArgumentNullException(nameof(primarySource));
            }

            if (secondarySource == null)
            {
                throw new ArgumentNullException(nameof(secondarySource));
            }

            var commonElements = new List<T>();
            uniqueInPrimarySource = new List<T>();
            uniqueInSecondarySource = new List<T>();

            var secondarySourceCopy = secondarySource.ToList();

            foreach (var elementA in primarySource)
            {
                var equalElementInSourceB = secondarySourceCopy.FirstOrDefault(elementB => equalityComparer.Equals(elementB, elementA));
                if (equalElementInSourceB != null)
                {
                    commonElements.Add(elementA);
                    secondarySourceCopy.Remove(equalElementInSourceB);
                }
                else
                {
                    uniqueInPrimarySource.Add(elementA);
                }
            }

            uniqueInPrimarySource = secondarySourceCopy;

            return commonElements;
        }

I managed to get rid of a foreach by removing ab(intersect) elements from b.

UPDATE 2: Of couse this line

        uniqueInPrimarySource = secondarySourceCopy;

should be

    uniqueInSecondarySource = secondarySourceCopy;
yBother
  • 648
  • 6
  • 25
  • 3
    Your code is currently very inefficient - you're iterating over `sourceB` for every element of `sourceA`. I would suggest that you could probably do this rather more effectively by creating a `HashSet` based on one of `sourceA` or `sourceB`, then iterating over the other sequence, and taking appropriate action based on that. – Jon Skeet Sep 27 '20 at 17:31
  • I know that it is not very efficient. I am curios, how would you do this by creating a HashSet? I can assume that sourceA and sourceB are already distinct collections. My problem is that I cannot answer all 3 questions in a single iteration. Is creating a Hashset of a combined collection of sourceA and sourceB faster than creating it via Intersect? – yBother Sep 28 '20 at 07:55
  • No, I'm suggesting creating a hash set from *one* of `sourceA` or `sourceB`, then iterating over the other, checking each member to see whether it's also in the set. If the collections are already distinct, you can use `Remove` to remove the item from the hash set if there's a match, which means anything left in the set at the end was only in that source. – Jon Skeet Sep 28 '20 at 08:58

2 Answers2

2

Maybe you could not name them sourceA and sourceB but primarySource and secundarySource?

Also you could consider returning commonElements instead of making it an out parameter since the method is called on sourceA one could assume that that is the primary source for elements (especially when that's the name of the parameter). If you do this then also consider changing the name as CompareTo might not fully cover it. I think Intersect would be more appropriate as that would then be the result of this function (and elementsOnlyInSourceA + elementsOnlyInSourceB would be by-products)

Thomas
  • 258
  • 2
  • 5
1

If you implement your method like this, it will more efficient. This implementation makes use of the HashSet<T>, which will minimize the cost of determining whether an element is contained in the collection (given that the IEqualityComparer<T> is implemented correctly and does not return random hashcodes for objects)

public static IEnumerable<T> Intersect<T>(
    this IEnumerable<T> source,
    IEnumerable<T> secondarySource,
    out IEnumerable<T> uniqueInSource,
    out IEnumerable<T> uniqueInSecondarySource,
    IEqualityComparer<T> comparer)
{
    if (source == null) throw new ArgumentNullException(nameof(source));
    if (secondarySource == null) throw new ArgumentNullException(nameof(secondarySource));

    var hashset = new HashSet<T>(secondarySource, comparer);
    var intersection = new List<T>();
    var uniqueInPrimary = new List<T>();

    foreach (var item in source)
    {
        if (hashset.Remove(item))
        {
            intersection.Add(item);
        }
        else
        {
            uniqueInPrimary.Add(item);
        }

    }

    uniqueInSource = uniqueInPrimary;
    uniqueInSecondarySource = hashset.ToArray();
    return intersection;
}
Ehssan
  • 739
  • 6
  • 15
  • yes okay I see. so basically var equalElementInSourceB = secondarySourceCopy.FirstOrDefault(elementB => equalityComparer.Equals(elementB, elementA)) is beeing replaced by directly creating a new hashset. – yBother Sep 29 '20 at 07:18