0

I have a huge list of company names and a huge list of zipcodes associated with those names. (>100,000).

I have to output similar names (for example, AJAX INC and AJAX are the same company, I have chosen a threshold of 4 characters for edit distance), but only if their corresponding zipcodes match too.

The trouble is that I can put all these company names in a dictionary, and associate a list of zipcode and other characteristics with that dictionary key. However, then I have to match each pair, and with O(n^2), it takes forever. Is there a faster way to do it?

user1773010
  • 107
  • 7

1 Answers1

1

Create a dictionary keyed by zipcode, with lists of company names as the values. Now you only have to match company names per zipcode, a much smaller search space.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343