3

I have an SQL database of unique codes (consiting of numbers/letters).

I want to generate large amount of new codes for this database, that also need to take into account the previous codes that were not generated with any kind of seed or similar.

So, how would I do this the best way? The way it is done currently is that for each code that is generated it loops through all the codes in the database and makes sure the levenshtein distance is not to small. But this takes forever now that the database has grown large.

Any help appreciated. This is used in a ASP.NET C# application.

Infinite Recursion
  • 6,511
  • 28
  • 39
  • 51
JuhaKangas
  • 875
  • 5
  • 17
  • 1
    Is there a pattern for creating these codes ? For instance, are the numbers together and sequential? Are the letters together ? ... – Don Barry Mar 27 '14 at 18:18
  • What I have is a an serial id that is coupled with each code. But the codes are completely random, no pattern there. – JuhaKangas Mar 28 '14 at 08:21

1 Answers1

0

If the total number of issued codes, the length of your codes and acceptable Levenshtein distance are small enough, you could build a tree of existing codes and their 'Levenshtein neighbors' in memory to reduce the time to generate a new code to O(ln N). If there are too many codes, you could try creating an additional SQL table containing just the codes and neighbors and rely on SQL for O(ln N) search. When you insert a new code, insert its neighbors together with it.

If you have the flexibility, i.e. you can increase code length by 1 or add a new character to the acceptable character set, or if there is an unused character in some position, the best solution would be to separate the 'old' and 'new' code spaces and generate new ones algorithmically to meet your requirements. This approach was adopted for UUID/GUID when they decided not to use the computer's MAC address in it.

Anton Tykhyy
  • 19,370
  • 5
  • 54
  • 56
  • Yeah, except I guess I would have to do that everytime I need new codes and it would just keep growing? Your second solution would indeed be the best one. Do you happen to have any suggestions on algorithms or libraries for that? – JuhaKangas Apr 02 '14 at 09:31
  • Nothing particular, I'm afraid. The requirement for codes to be separated by L-distance is quite difficult to fulfil in a straightforward way. Possibly that's why the most common approach seems to be to add one or two check characters which can help detect and/or correct the most common classes of errors — transpositions, incorrect single characters etc. Credit card and bank account numbers use such schemes, you can search from there. As for converting between codes and sequential numbers, the usual approach is symmetric-key encryption. – Anton Tykhyy Apr 02 '14 at 14:33