I have a huge list of company names and a huge list of zipcodes associated with those names. (>100,000).
I have to output similar names (for example, AJAX INC and AJAX are the same company, I have chosen a threshold of 4 characters for edit distance), but only if their corresponding zipcodes match too.
The trouble is that I can put all these company names in a dictionary, and associate a list of zipcode and other characteristics with that dictionary key. However, then I have to match each pair, and with O(n^2), it takes forever. Is there a faster way to do it?