1

I have a bunch of string data and I can loop through it one by one. What's a good collection (and how to implement it) so that I only get the distinct strings?

The client I am doing this for doesn't even use .NET 3.5 so .Distinct is out. They use .NET framework 2.0.

And I am reading the list one at a time and don't know how many records it will have until I'm done.

cdub
  • 24,555
  • 57
  • 174
  • 303
  • 1
    [HashSet](http://msdn.microsoft.com/en-us/library/bb359438.aspx), but you will lose the order of the collection. – Sjoerd Dec 20 '11 at 08:37
  • i have a bunch of random data with duplicates, and I want to store unique data. What should I use? – cdub Dec 20 '11 at 08:39
  • in php I would just load everything into an array and then call array_unique() – cdub Dec 20 '11 at 08:41
  • what's a .net equivalent – cdub Dec 20 '11 at 08:41
  • The .Net equivalent to array_unique in php would be the linq Distinct() method as I in my answer below. ...at least roughly, it gets the job done. – rfmodulator Dec 20 '11 at 08:47

3 Answers3

2

One way is using Distinct to make your strings unique:

List<string> a = new List<string>();
a.AddRange(new string[] { "a", "b", "a", "c", "d", "b" });
List<string> b = new List<string>();
b.AddRange(a.Distinct());

Another resource on LINQ's Distinct: http://blogs.msdn.com/b/charlie/archive/2006/11/19/linq-farm-group-and-distinct.aspx

Another way: use a HashSet as others suggested;

HashSet<string> hash = new HashSet<string>(inputStrings);

Have a look for this link, to see how to implement it in .net 2.0: https://stackoverflow.com/a/687042/284240

If you're not on 3.5, you also can do it manually:

List<string> newList = new List<string>();

foreach (string s in list)
{
   if (!newList.Contains(s))
      newList.Add(s);
}
// newList contains the unique values

Another solution (maybe a little faster):

Dictionary<string,bool> dic = new Dictionary<string,bool>();

foreach (string s in list)
{
   dic[s] = true;
}

List<string> newList = new List<string>(dic.Keys);
// newList contains the unique values

https://stackoverflow.com/a/1205813/284240

Community
  • 1
  • 1
Tim Schmelter
  • 450,073
  • 74
  • 686
  • 939
1

If you're using .Net 3.5 or above, put the strings in a List<> and use the linq method Distinct().

using System.Linq;

IEnumerable<string> strs = new List<string>(new[] { "one", "two", "three", "one" });

var distinct = strs.Distinct();

In .Net 2.0 you have no choice but to do it manually.

rfmodulator
  • 3,638
  • 3
  • 18
  • 22
0

Perhaps I'm being dense and not fully understanding the question but can't you just use a regular List and just use the .Contains method to check if each string exists in the list before adding it in the loop? You might need to keep an eye on performance if you have a lot of strings.

Simon
  • 6,062
  • 13
  • 60
  • 97