2

I am using iText for extraction of data from PDFs. My application is able to read PDFs with English characters, but we found a new file with Chinese characters. When I tried to extract that data, I get an error:

ExceptionConverter: com.itextpdf.text.DocumentException: Font 'STSong-Light' with 'UniGB-UCS2-H' is not recognized.

So I added itext-asian.jar. Now I am not getting an error, but getTextFromPage() returns an empty string. Am I missing something?

PdfReader pr = new PdfReader(inputPdf);
// get the number of pages in the document
PdfTextExtractor pte =
    new PdfTextExtractor(pr, new CustomLocationAwarePdfRenderListener(scanDepth));
int pNum = pr.getNumberOfPages();

String text = "";
// extract text from each page and write it to the output text file
for (int page = 1; page <= pNum; page++) {
    text = text.concat("\n").concat(pte.getTextFromPage(page));
}
riddle_me_this
  • 8,575
  • 10
  • 55
  • 80
bina
  • 21
  • 2

0 Answers0