0

I have a list of string arrays. I want to remove duplicates and empty strings by doing a check only on the first element of the string array. I have seen some SO posts using IEqualityComparer to achieve removing duplicates comparing whole string arrays which I think makes it look more elegant and potentially more efficient. However I failed to make it to check it only on the first element of the string array to remove unwanted ones because IEqualityComparer confuses me. How can I achieve this more elegantly? My current non-elegant & non-efficient working code:

void method(List<string[]> contactAndNumber)
{
    List<string[]> contactAndNumberSanitized = new List<string[]>();
    contactAndNumberSanitized.Clear();
    bool rem = false;
    List<int> remList = new List<int>();
    for (int i = 0; i < contactAndNumber.Count; i++)
    {
        contactAndNumberSanitized.Add(new string[] { contactAndNumber[i][0], contactAndNumber[i][1] });
        for (int j = 0; j < contactAndNumberSanitized.Count; j++)
            if (i != j)
                if (contactAndNumber[i][0] == contactAndNumberSanitized[j][0])
                {
                    rem = true;
                    break;
                }
        if (rem || string.IsNullOrEmpty(contactAndNumber[i][0]))
            remList.Add(i);
        rem = false;
    }
    for (int i = remList.Count - 1; i >= 0; i--)
        contactAndNumberSanitized.RemoveAt(remList[i]);
}

And this is the non-working code I tried to implement to do a check on string array's first item only:

sealed class EqualityComparer: IEqualityComparer<string[]>
{
    public bool Equals(string[] x, string[] y)
    {
        if (ReferenceEquals(x[0], y[0]))
            return true;

        if (x == null || y == null)
            return false;

        return x[0].SequenceEqual(y[0]);
    }

    public int GetHashCode(string[] obj)
    {
        if (obj == null)
            return 0;

        int hash = 17;

        unchecked
        {
            foreach (string s in obj)
                hash = hash*23 + ((s == null) ? 0 : s.GetHashCode());
        }

        return hash;
    }
}

By calling this under some method:

var result = list.Distinct(new EqualityComparer());
Baz Guvenkaya
  • 1,482
  • 3
  • 17
  • 26

2 Answers2

3

Your code can be vastly simplified:

var input = new List<string[]> { new[] { "a", "b" }, new[] { "a", "c" }, new[] { "c", "d" }};
var result = input.GroupBy(l => l.FirstOrDefault()).Select(g => g.First());

This will give you the unique arrays, using the first element of each array to determine uniqueness.

However, since you're using the first element of the array to determine uniqueness, there is an edge case for an empty set being seen as equivalent to { null }. Depending on how you want to treat empty sets, you'll need to modify the code to filter the input, or change the GroupBy

Rob
  • 26,989
  • 16
  • 82
  • 98
  • I'm accepting this one as the answer since it's a one line solution. Mate - top bloke! =) Can removing the arrays with empty string key be implemented within this LINQ query as well? – Baz Guvenkaya Jun 17 '16 at 04:32
  • 1
    @BarryGuvenkaya Sure, you'd add a filter before the group by. For example: `input.Where(a => !string.IsNullOrEmpty(a.FirstOrDefault())).GroupBy(...`. Which would remove all empty arrays, and arrays with `null` as the first (and/or only) element – Rob Jun 17 '16 at 04:34
0

Since you're working with a List<T>, you can use the RemoveAll method.

Edit: original answer may not work. Revised below.

Edit 2: Actually, if you want to remove all duplicates (without leaving the original), use this:

var duplicates = data.Where(x => x == null || string.IsNullOrEmpty(x[0]) || data.Where(y => y != null).Count(y => y[0] == x[0]) > 1).ToList();
data.RemoveAll(x => duplicates.Contains(x));

But if you want to leave the last in a set of duplicates (e.g. the last "A" in a set of three "A"s), then you could use my original answer:

data.RemoveAll(x => x == null || string.IsNullOrEmpty(x[0]) || data.Where(y => y != null).Count(y => y[0] == x[0]) > 1);
wablab
  • 1,703
  • 13
  • 15