iText - Generating text file from pdf with Chinese characters

Asked Feb 26 '16 at 03:34

Active Feb 26 '16 at 06:31

Viewed 839 times

I am using iText for extraction of data from PDFs. My application is able to read PDFs with English characters, but we found a new file with Chinese characters. When I tried to extract that data, I get an error:

ExceptionConverter: com.itextpdf.text.DocumentException: Font 'STSong-Light' with 'UniGB-UCS2-H' is not recognized.

So I added itext-asian.jar. Now I am not getting an error, but getTextFromPage() returns an empty string. Am I missing something?

PdfReader pr = new PdfReader(inputPdf);
// get the number of pages in the document
PdfTextExtractor pte =
    new PdfTextExtractor(pr, new CustomLocationAwarePdfRenderListener(scanDepth));
int pNum = pr.getNumberOfPages();

String text = "";
// extract text from each page and write it to the output text file
for (int page = 1; page <= pNum; page++) {
    text = text.concat("\n").concat(pte.getTextFromPage(page));
}

edited Feb 26 '16 at 06:31

riddle_me_this

8,575
10
55
80

asked Feb 26 '16 at 03:34

bina

http://stackoverflow.com/questions/21577944/itext-generating-pdf-with-chinese-characters-chinese-simplified – Madhawa Priyashantha Feb 26 '16 at 06:35
@FastSnail the OP claims he already did what is the recommendation from the accepted answer, i.e. adding itext-asian.jar... – mkl Feb 26 '16 at 06:42
anybody have solution plz suggest... – bina Feb 29 '16 at 14:59

iText - Generating text file from pdf with Chinese characters

0 Answers0