I am using iText for extraction of data from PDFs. My application is able to read PDFs with English characters, but we found a new file with Chinese characters. When I tried to extract that data, I get an error:
ExceptionConverter: com.itextpdf.text.DocumentException: Font 'STSong-Light' with 'UniGB-UCS2-H' is not recognized.
So I added itext-asian.jar
. Now I am not getting an error, but getTextFromPage()
returns an empty string. Am I missing something?
PdfReader pr = new PdfReader(inputPdf);
// get the number of pages in the document
PdfTextExtractor pte =
new PdfTextExtractor(pr, new CustomLocationAwarePdfRenderListener(scanDepth));
int pNum = pr.getNumberOfPages();
String text = "";
// extract text from each page and write it to the output text file
for (int page = 1; page <= pNum; page++) {
text = text.concat("\n").concat(pte.getTextFromPage(page));
}