-1

I'm working with poppler on C++, and I have some PDFs that contain barcodes. Most of the pdf printers I have to work with print the PDF and its numbers separatedly, so I don't have to deal with barcode reading. But those that express the barcode and its number in a single character bring me strange characters that I don't know how to translate.

For instance there's a document with the following barcode. enter image description here 3065894901901000368529198928291201901066

But if I copypaste it I get this (i get the same result with poppler's pdftotext) (NÏça1è:0TãMCçLM<1è:Ð)

Is there a way to translate these strange characters back to the numbers they are meant to be?

Coyoteazul
  • 123
  • 1
  • 9
  • 2
    The barcode is probably in a different font where the characters that you see from a cut and paste in your normal font map to the graphical bars you see on the page in the special barcode font. Perhaps in the pdf file it will name the font. From the font name, you may be able to find the translation from numbers to font codes and reverse this translation. – Mike Wodarczyk Mar 17 '19 at 00:34
  • If the barcode is an image you'll need to use OCR like tesseract – Gillespie Mar 17 '19 at 03:39
  • Is there a programming question? This looks like a barcode/pdf/font question. – JaMiT Mar 17 '19 at 04:41
  • @MikeWodarczyk Thanks, I managed to find out that the font is interleaved-2of5. I also managed to find a conversion table, but it seems to be different of mine. However thanks to that read I understood how the weird characters that I'm seeing (they are called ITF14 btw) work, so I can make my own convertion table (and hope it applies to all cases). – Coyoteazul Mar 17 '19 at 21:57

1 Answers1

0

Thanks to @MikeWodarczyk comment I managed to find a convertion table. It didn't apply to my case, but now that I understand how the convertion works I can make my own convertion table https://www.barcodefaq.com/1d/interleaved-2of5/

Coyoteazul
  • 123
  • 1
  • 9