1

I have an array of Customer[] objects, and I want to use it to create a Dictionary<Customer, string>. What is the easiest way to examine the array for duplicates before I load the Dictionary? I want to avoid "ArgumentException: An item with the same key has already been added". Thanks.

  • I would like to know up front if duplicates were passed vs. checking on the fly as I add items. I can use Dictionary.ContainsKey but then I have to spin through the array twice, which I guess is not terrible, I was just wondering if there was an easier way I am not aware of. Thanks. – Jack T. Colton Oct 26 '09 at 20:48

6 Answers6

6

Just call Dictionary.ContainsKey(key) before you add your Customers.

Jan Bannister
  • 4,859
  • 8
  • 38
  • 45
5

You could use LINQ to do both:

Customer[] customers; // initialized somehow...
var customerDictionary = customers.Distinct().ToDictionary( cust => cust.SomeKey );

If you will build the dictionary in a less straightforward fashion, you can just use the Distinct() extension method to get a unique array like so:

Customer[] uniqueCustomers = customers.Distinct().ToArray();

If you need to be aware of potential duplicates, you could use GroupBy( c => c ) first to identify which items have duplicates.

Finally, if you don't want to use LINQ, you can build the dictionary on the fly and use a precondition check when adding each item:

var customerDictionary = new Dictionary<Customer,string>();
foreach( var cust in customers )
{
    if( !customerDictionary.ContainsKey(cust) )
        customerDictionary.Add( cust, cust.SomeKey ); 
}
LBushkin
  • 129,300
  • 32
  • 216
  • 265
  • I like the look of this approach, and I'm reluctant to talk about performance in this case but wouldn't calling distinct on the array involve doing a lot of comparisons? Dictionary.ContainsKey is relatively O(1). – Josh Smeaton Oct 26 '09 at 20:54
  • @Josh: My understanding is that LINQ's Distinct() operator internally builds a hashset structure to optimize its performance. So it should perform better than just iteratively searching through a list for duplicates. Read this SO question for more: http://stackoverflow.com/questions/146358/efficiently-merge-string-arrays-in-net-keeping-distinct-values – LBushkin Oct 27 '09 at 01:07
2

How big is the array? and how likely is it that there will be duplicates?

Checking each element of the array against all the others is quite a expensive operation.

It would be quicker to call Dictionary.ContainsKey(key) before adding each item.

NOTE: If duplicates are rare then you could use exception handling, but that's bad programming practice.

ChrisF
  • 134,786
  • 31
  • 255
  • 325
  • Small arrays. This was the direction I was headed, but I would like to know up front if dupes were passed in before I begin processing. The string in Dictionary is some response XML from a web service associated with the Customer object. – Jack T. Colton Oct 26 '09 at 20:53
  • Using exception handling for flow control is not a desirable practice. – LBushkin Oct 26 '09 at 20:53
  • 1
    Bad practice to use exceptions for process flow! – Tor Haugen Oct 26 '09 at 20:55
2

The most efficient way of doing that, from BOTH PERFORMANCE and CODE points of view, is this:

dict[key] = value

This way the exception mentioned by you will never get thrown, and the key lookup will not happen twice

Alexander
  • 1,546
  • 1
  • 9
  • 4
1

What is your definition of duplicate in this case?

If its simply the same object instance (the same pointer) then that's simple, you can use any of the methods in the other answers given here.

Sometimes though the concept of equality is not so straight forward, is a different object instance with the same data equal? In that case you probably want an implementation of an IEqualityComparer to help you.

Tim Jarvis
  • 18,465
  • 9
  • 55
  • 92
  • Let's say two Customer objects are dupes if they have the same SSN. – Jack T. Colton Oct 26 '09 at 20:56
  • Ah, well in this case you will need to specify the Equality, as a simple comparison of the pointer is not going to tell you that. – Tim Jarvis Oct 26 '09 at 21:20
  • so, you can use the SSD as a key in a dictionary, or for a more complete solution you can implement a IEqualityComparer that you can use in a bunch of linq extension methods. – Tim Jarvis Oct 26 '09 at 21:26
0

Why not this??

Customers.Distinct.ToDictionary(o=>o, GenerateString(o));
Restore the Data Dumps
  • 38,967
  • 12
  • 96
  • 122