0

I need to generate a PDF417 barcode from some text. I have an API (that I didn't create) that generates a PDF417 barcode given the data, number of rows and number of columns (among other parameters irrelevant to the question).

My PDF417 barcode uses text encoding. This means 1 codeword can hold up to 2 characters. Now, the number of columns HAS to be fixed because I'm printing this barcode in a very constrained space.

The following is what I have inferred from this document (Refer page 38 - Sizing a barcode):

  1. Let number of codewords per row, CWPerRow = 7.
  2. Number of codewords required for some given text, ReqCW = strlen(text) / 2.
  3. Number of rows required = ReqCW / CWPerRow

When I test the above algorithm, nothing displays. When I use the same API when the data is very small and the number of rows = 25, the barcode prints just fine (verified by various barcode scanners).

So, how do I calculate the number of rows required for some given text when the number of columns is known?

Anish Ramaswamy
  • 2,326
  • 3
  • 32
  • 63
  • Could You write more info about the test cases? when the number of rows=25, is this the computed number or just a large number? From strlen, I think You are using C. Is there an integer division in step 3? That could cause a wrong result. –  Jun 03 '13 at 17:26
  • @gkovacs90, 25 is just some number I came up with after trial and error. – Anish Ramaswamy Jun 04 '13 at 05:03
  • @gkovacs90, The API I'm using accepts only `int` parameters. Also, that document I referred to in my question talks about that division being integer not float/double. And yes, I'm coding this in C. But I think that the language doesn't matter for this particular question. – Anish Ramaswamy Jun 04 '13 at 05:04
  • Then at the case when `ReqCW=26` and `CWPerRow=7`, the result will be 3, which is incorrect, 26 CW don't fit into 3 row. Try increment the result by one. –  Jun 04 '13 at 05:15
  • @gkovacs90, Wow that's pretty dumb mistake haha. Thanks for that. But anyway, according to Markus' answer (see below), there's no way to correctly determine number of codewords required without actually encoding the text which sucks. – Anish Ramaswamy Jun 04 '13 at 07:09

2 Answers2

2

You could look at the source-code of some PDF417 implementation, such as ZXing.

The text encoding isn't just two characters per code-word. If you use any other character than uppercase letters and space, the encoder will add extra characters to switch character-sets etc. You really have to encode the text to see how many code-words it will become.

public class Test
{
    public static void main(String[] args)
    {
        String msg = "Hello, world!";
        int columns = 7;
        int sourceCodeWords = calculateSourceCodeWords(msg);
        int errorCorrectionCodeWords = getErrorCorrectionCodewordCount(0);
        int rows = calculateNumberOfRows(sourceCodeWords, errorCorrectionCodeWords, columns);
        System.out.printf("\"%s\" requires %d code-words, and %d error correction code-words. This becomes %d rows.%n",
                msg, sourceCodeWords, errorCorrectionCodeWords, rows);
    }


    public static int calculateNumberOfRows(int sourceCodeWords, int errorCorrectionCodeWords, int columns) {
        int rows = ((sourceCodeWords + 1 + errorCorrectionCodeWords) / columns) + 1;
        if (columns * rows >= (sourceCodeWords + 1 + errorCorrectionCodeWords + columns)) {
            rows--;
        }
        return rows;
    }

    public static int getErrorCorrectionCodewordCount(int errorCorrectionLevel) {
        if (errorCorrectionLevel < 0 || errorCorrectionLevel > 8) {
            throw new IllegalArgumentException("Error correction level must be between 0 and 8!");
        }
        return 1 << (errorCorrectionLevel + 1);
    }

    private static boolean isAlphaUpper(char ch) {
        return ch == ' ' || (ch >= 'A' && ch <= 'Z');
    }

    private static boolean isAlphaLower(char ch) {
        return ch == ' ' || (ch >= 'a' && ch <= 'z');
    }

    private static boolean isMixed(char ch) {
        return "\t\r #$%&*+,-./0123456789:=^".indexOf(ch) > -1;
    }

    private static boolean isPunctuation(char ch) {
        return "\t\n\r!\"$'()*,-./:;<>?@[\\]_`{|}~".indexOf(ch) > -1;
    }

    private static final int SUBMODE_ALPHA = 0;
    private static final int SUBMODE_LOWER = 1;
    private static final int SUBMODE_MIXED = 2;
    private static final int SUBMODE_PUNCTUATION = 3;

    public static int calculateSourceCodeWords(String msg)
    {
        int len = 0;
        int submode = SUBMODE_ALPHA;
        int msgLength = msg.length();
        for (int idx = 0; idx < msgLength;)
        {
            char ch = msg.charAt(idx);
            switch (submode)
            {
                case SUBMODE_ALPHA:
                    if (isAlphaUpper(ch))
                    {
                        len++;
                    }
                    else
                    {
                        if (isAlphaLower(ch))
                        {
                            submode = SUBMODE_LOWER;
                            len++;
                            continue;
                        }
                        else if (isMixed(ch))
                        {
                            submode = SUBMODE_MIXED;
                            len++;
                            continue;
                        }
                        else
                        {
                            len += 2;
                            break;
                        }
                    }
                    break;
                case SUBMODE_LOWER:
                    if (isAlphaLower(ch))
                    {
                        len++;
                    }
                    else
                    {
                        if (isAlphaUpper(ch))
                        {
                            len += 2;
                            break;
                        }
                        else if (isMixed(ch))
                        {
                            submode = SUBMODE_MIXED;
                            len++;
                            continue;
                        }
                        else
                        {
                            len += 2;
                            break;
                        }
                    }
                    break;
                case SUBMODE_MIXED:
                    if (isMixed(ch))
                    {
                        len++;
                    }
                    else
                    {
                        if (isAlphaUpper(ch))
                        {
                            submode = SUBMODE_ALPHA;
                            len++;
                            continue;
                        }
                        else if (isAlphaLower(ch))
                        {
                            submode = SUBMODE_LOWER;
                            len++;
                            continue;
                        }
                        else
                        {
                            if (idx + 1 < msgLength)
                            {
                                char next = msg.charAt(idx + 1);
                                if (isPunctuation(next))
                                {
                                    submode = SUBMODE_PUNCTUATION;
                                    len++;
                                    continue;
                                }
                            }
                            len += 2;
                        }
                    }
                    break;
                default:
                    if (isPunctuation(ch))
                    {
                        len++;
                    }
                    else
                    {
                        submode = SUBMODE_ALPHA;
                        len++;
                        continue;
                    }
                    break;
            }
            idx++; // Don't increment if 'continue' was used.
        }
        return (len + 1) / 2;
    }
}

Output:

"Hello, world!" requires 9 code-words, and 2 error correction code-words. This becomes 2 rows.

Markus Jarderot
  • 86,735
  • 21
  • 136
  • 138
  • So there's no way to determine a relation between text and number of rows required without encoding it? Also, wouldn't this introduce significant overhead? I have a GUI that has some fields which has to be printed. There is a number of pages required being calculated. This barcode encoding might slow down that entire thing quite a bit I feel because that calculation happens whenever a field is changed. – Anish Ramaswamy Jun 04 '13 at 04:57
  • It's not that slow. Instead of a few microseconds, you have a few milliseconds, and there will only be at most about 1000 characters. If you are concerned, you could update it to only *count* the code-words needed to encode the text. Another tweak would be to wait a short period (0.2 - 1 second) after the last character is type, before updating the row-count. – Markus Jarderot Jun 04 '13 at 08:28
  • I've added code to calculate the number of code-words required. – Markus Jarderot Jun 04 '13 at 11:13
1

I've made a Python port of Markus Jarderot's answer. The calculation remains the same.

import string

SUBMODE_ALPHA = string.ascii_uppercase + ' '
SUBMODE_LOWER = string.ascii_lowercase + ' '
SUBMODE_MIXED = "\t\r #$%&*+,-./0123456789:=^"
SUBMODE_PUNCTUATION = "\t\n\r!\"$'()*,-./:;<>?@[\\]_`{|}~"


def calculateNumberOfRows(sourceCodeWords, errorCorrectionCodeWords, columns):
    rows = ((sourceCodeWords + 1 + errorCorrectionCodeWords) / columns) + 1
    if columns * rows >= sourceCodeWords + 1 + errorCorrectionCodeWords + columns:
        rows -= 1
    return rows

def getErrorCorrectionCodewordCount(errorCorrectionLevel):
    if 0 > errorCorrectionLevel > 8:
        raise ValueError("Error correction level must be between 0 and 8!")
    return 1 << (errorCorrectionLevel + 1)


def calculateSourceCodeWords(msg):
    length = 0;
    submode = SUBMODE_ALPHA
    msgLength = len(msg)
    idx = 0
    while(idx < msgLength):
        ch = msg[idx]
        length += 1

        if not ch in submode:
            old_submode = submode
            if submode == SUBMODE_ALPHA:
                for mode in (SUBMODE_LOWER, SUBMODE_MIXED):
                    if ch in mode:
                        submode = mode

            elif submode == SUBMODE_LOWER:
                if ch in SUBMODE_MIXED:
                    submode = SUBMODE_MIXED

            elif submode == SUBMODE_MIXED:
                for mode in (SUBMODE_ALPHA, SUBMODE_LOWER):
                    if ch in mode:
                        submode = mode

                if idx + 1 < len(msg) and msg[idx + 1] in SUBMODE_PUNCTUATION:
                    submode = SUBMODE_PUNCTUATION


            elif submode == SUBMODE_PUNCTUATION:
                submode = SUBMODE_ALPHA

            if old_submode != submode:
                # submode changed
                continue

            length += 1

        idx += 1 # Don't increment if 'continue' was used.
    return (length + 1) / 2


def main():
    msg = "Hello, world!"
    columns = 7
    sourceCodeWords = calculateSourceCodeWords(msg)
    errorCorrectionCodeWords = getErrorCorrectionCodewordCount(0)
    rows = calculateNumberOfRows(sourceCodeWords, errorCorrectionCodeWords, columns)
    print("\"%s\" requires %d code-words, and %d error correction code-words. This becomes %d rows.\n"
           %( msg, sourceCodeWords, errorCorrectionCodeWords, rows))



if __name__ == '__main__':
    main()
c0ff3m4kr
  • 128
  • 1
  • 3