2

I am desperate in the search for an algorithm to create a checksum that is a maximum of two characters long and can recognize the confusion of characters in the input sequence. When testing different algorithms, such as Luhn, CRC24 or CRC32, the checksums were always longer than two characters. If I reduce the checksum to two or even one character, then no longer all commutations are recognized.

Does any of you know an algorithm that meets my needs? I already have a name with which I can continue my search. I would be very grateful for your help.

Patrick Vogt
  • 898
  • 2
  • 14
  • 33
  • What exactly is the class of errors that you want to detect? Swapping neighboring characters only? – Rafał Dowgird Jan 12 '17 at 15:54
  • I have not specified a class of errors but the more errors can be detected the better it would be. – Patrick Vogt Jan 12 '17 at 15:57
  • 1
    One of the CRC16 variants? – Paul Hankin Jan 12 '17 at 16:25
  • At the moment it's hard to answer the question as written. It's essentially "name a 16 bit checksum", which doesn't match the stackoverflow ideal of a question with a single right answer. – Paul Hankin Jan 12 '17 at 16:29
  • 1
    Your specification is too broad at the moment. What kind of input data do you have ? Numeric ? Alphanumeric ? Unicode ? Human Readable Text ? How long are your messages on average ? There are lots of error detection algorithms (which is I guess is what you are looking for), but you have to choose between them based on type of data you have. E.g. you mentioned `Luhn`, which is a "check digit" algorithm normally applied to a numeric data, if that is the case, you may also want to look into `Damm` and `Verhoeff`. – zeppelin Jan 12 '17 at 18:43
  • @zeppelin My input data are alphanumeric with a length of 18 or 19 characters. Exactly, I am looking for an algorithm to recognize errors in the input. Yes, the problem with Luhn was that this algorithm has always delivered only one number and thus no commutation errors were detected during input. – Patrick Vogt Jan 12 '17 at 18:48
  • 1
    @PatrickVogt Could you please also provide a example of a typical "commutation error", you would like to detect ? Luhn (as well as Damm and Verhoeff) are based on the statistical model for human-made errors (typos), like mistyping a single character or swapping (transposing) two adjacent characters, and are tailored towards detecting those types of errors. – zeppelin Jan 12 '17 at 19:16
  • @zeppelin It would be best to recognize all possible permutations of an input word, for example 'Hello22334'. This is probably not realizable but at least all changes of neighboring signs should be recognized and in the ideal case also the next ones. For example, it would be desirable if the following error were detected: Hello22334 -> Hloel24233 – Patrick Vogt Jan 12 '17 at 19:27
  • 1
    @PatrickVogt Ok, I see, and one more question is if you wish to encode the checksum as two alphanumeric characters or two full bytes (i.e. binary) ? – zeppelin Jan 12 '17 at 20:21
  • @zeppelin I would prefer alphanumeric characters but bytes are also ok. – Patrick Vogt Jan 12 '17 at 20:27
  • 1
    Ok, taking that your data is alphanumeric, and you want to detect all the permutations (in the perfect case), and you can afford to use the binary checksum (i.e. full 16 bits), my guess is that you should probably go with CRC16 (as already suggested by @Paul Hankin), as it is more information-dense compared to check-digit algorithms like Luhn or Damm, and is more "generic" when it comes to possible types of errors. Maybe something like CRC-CCITT (CRC-16-CCITT), you can give it a try [here](https://www.lammertbies.nl/comm/info/crc-calculation.html), to see how it works for you. – zeppelin Jan 13 '17 at 20:22
  • @zeppelin Thank you! You save my day :) Could you post your solution as an answer so I can accept it. – Patrick Vogt Jan 14 '17 at 08:33
  • @PatrickVogt Sure, good to hear that this was helpful for you. – zeppelin Jan 14 '17 at 10:00

1 Answers1

1

Taking that your data is alphanumeric, you want to detect all the permutations (in the perfect case), and you can afford to use the binary checksum (i.e. full 16 bits), my guess is that you should probably go with CRC-16 (as already suggested by @Paul Hankin in the comments), as it is more information-dense compared to check-digit algorithms like Luhn or Damm, and is more "generic" when it comes to possible types of errors.

Maybe something like CRC-CCITT (CRC-16-CCITT), you can give it a try here, to see how it works for you.

zeppelin
  • 8,947
  • 2
  • 24
  • 30