0

I want to read a serial number of about 16-20 characters (A-Z, 0-9) with the help of OCR. Since all character won't be recognized correctly every time I want to add one check character to the serial number. At the moment I found the simple Luhn mod N algo (Wikipedia). This algorithm isn't safe about transposition errors (09 => 90).

Implementation from Wikipedia:

 char GenerateCheckCharacter(string input) {

    int factor = 2;
    int sum = 0;
    int n = NumberOfValidInputCharacters();

    // Starting from the right and working leftwards is easier since 
    // the initial "factor" will always be "2" 
    **//int index = 0;**
    for (int i = input.Length - 1; i >= 0; i--) {
        int codePoint = CodePointFromCharacter(input[i]);
        int addend = factor * codePoint;

        // Alternate the "factor" that each "codePoint" is multiplied by
        factor = (factor == 2) ? 1 : 2;
        **//factor = index;**


        // Sum the digits of the "addend" as expressed in base "n"
        addend = (addend / n) + (addend % n);
        sum += addend;
        **//index++;**
    }

    // Calculate the number that must be added to the "sum" 
    // to make it divisible by "n"
    int remainder = sum % n;
    int checkCodePoint = (n - remainder) % n;

    return CharacterFromCodePoint(checkCodePoint);
}

NumberOfValidInputCharacters() would be 36 (A-Z, 0-9)

But if I modify the "factor" variable to the actual index of the character inside the serial number, is it then safer as before? (see ** ** lines in the code)

Mr.Sheep
  • 1,368
  • 1
  • 15
  • 32
  • Does this even matter? Human input is exceptionally prone to transposition errors — they seem to happen all teh time when I'm typing :-) — but I'd say the chance of a transposition error happening with OCR is actually very small. – r3mainer Apr 27 '15 at 12:34
  • My concern is, that two or more serial numbers have the same check character, or in other words, the computer detects the characters incorrect, but the check digit is still correct. – Mr.Sheep Apr 27 '15 at 12:38
  • 1
    Yes, but I'm claiming that the probability of a transposition error occurring with OCR is very small. Suppose your OCR software can recognize characters with 90% accuracy. The chance of two consecutive OCR errors is 1%, and the chance of two adjacent characters being swapped around as a result of consecutive errors is about 0.0008%, assuming a 36-character alphabet and uniform distributions of OCR errors, etc. So why do you think it's so important to guard against transposition errors? – r3mainer Apr 27 '15 at 12:46
  • Ok, maybe I should have asked my question in more general, than only to transposition error. Like is this an good algorithm to check if the read serial number is correct? Also there are the common OCR problems, like misinterpreting the zero as the letter O, or vis-versa. But you are right, that the possibility for that transposition error isn't high (I think I haven't checked the actual percentage right now), I just wanted to know if there is a way to guard against that error. And if my modification would improve the algorithm, or is it irrelevant. – Mr.Sheep Apr 27 '15 at 12:54

0 Answers0