Extract Braille text (image) from PDF using iTextSharp

Question

Braille is a special font for blind people. I am trying to decode the text written in Braille font in a PDF file and output the normal text. But the PDFTextExtractor (in iTextSharp) cannot handle this font. Is it possible in any other way?

I am trying to figure out how can I decode from a pdf file.

I tried using,

PdfReader pdf = new PdfReader("C:\\pdfs\\file.pdf");
string text = PdfTextExtractor.GetTextFromPage(pdf, 1);

this.brailleTextBox.Text = text.ToString();
this.normalTextBox.Text = text.ToString();

on a pdf file having text in regular font (e.g Arial) and braille font but it doesnt returns the braille text and instead return just the normal text on the page.

How can I get the Braille Font text instead, using iTextSharp.

Also, can you select the "text" in Adobe Acrobat? If you copy it does it come out as text? — Chris Haas, Aug 08 '11 at 12:59
I found the free braille font though and it gets selected correctly. But these fonts are different for different native languages which makes my trouble 2times. Now the braille character has to be decrypted via image processing. sample braille pdf http://dl.dropbox.com/u/18670740/BRAILLE%20CODES%20WITH%20TRANSLATION.pdf — UserBSS1, Aug 09 '11 at 15:43

score 0 · Accepted Answer · answered Aug 09 '11 at 15:55

0

(not an answer yet)

Okay, maybe I'm not understanding correctly. I just tried using the PdfTextExtractor on the PDF that you provided and it worked correctly. Specifically the following text was kicked out for page 1:

B   r    a   i     l    l    e   C   o   d    e   s 
B r a i l l e C o d e s 

Embossed dot positions as,   


A  B   C   D   E   F   G  H   I    J   K  
A B C D E F G H I J K 
L    M  N  O   P  Q   R  S   T   U   V  
L M N O P Q R S T U V 
W  X   Y   Z 
W X Y Z 


1   2   3    4   5   6    7   8   9   0 
1 2 3 4 5 6 7 8 9 0

I apologize if I'm misunderstanding you, but are you trying to get the text back as braille?

answered Aug 09 '11 at 15:55

Chris Haas

53,986
12
141
274

1

the braille character for these '{', '(', '[' is same, similarly for ']','}',',']' . So, if the font that I (or you) have installed is not the best font available with all the possible representations than its not possible to translate it perfectly. – UserBSS1 Aug 09 '11 at 17:06
1

I'm really sorry, but I'm still not sure what your actual question is now. Text is text - always. Fonts take text and display it in certain ways (cursive, braille, symbols, etc) called glyphs. The standard Braille system itself doesn't differentiate between the curly bracket and square bracket (as far as I can tell) and draws the same glyph for both of them. Behind these glyphs the true text still exists. If someone converts the glyphs to static images then the text will be lost, otherwise it will always be there. – Chris Haas Aug 09 '11 at 17:30

Extract Braille text (image) from PDF using iTextSharp

1 Answers1