0

I am trying to convert some html content to a pdf using the itext PdfWriter, like this:

Document document = new Document();
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
PdfWriter writer = PdfWriter.getInstance(document, outputStream);
document.open();
InputStream stream = new ByteArrayInputStream(content.getBytes(StandardCharsets.UTF_8));
XMLWorkerHelper.getInstance().parseXHtml(writer, document, stream, Charset.forName("UTF-8"));
document.close();

but the ă ș ț charaters are missing from the generated pdf. I have tried setting the encoding or the font, but with no luck. What I tried was to use a font provider and set it as a param to the parseXHtml method.

I set the encoding, but nothing changed.

XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider();
fontProvider.setUseUnicode(true);
fontProvider.defaultEncoding = BaseFont.CP1257;

I also tried setting the font, but it was not applied to the pdf.

XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
fontProvider.register(PATH_TO_TTF_FONT_FILE_HOSTED_ON_S3);

And then set the param for parseXHtml.

XMLWorkerHelper.getInstance().parseXHtml(writer, document, stream, Charset.forName("UTF-8"), fontProvider);

Is there any way I could use the PdfWriter to convert all characters correctly from html to pdf?

aniri
  • 1,831
  • 1
  • 18
  • 28
  • UTF-8, that you are using, don't have those characters, try UTF-16. – res Mar 10 '20 at 08:29
  • @res I have replaced UTF8 with UTF16 and nothing changed :( – aniri Mar 10 '20 at 09:23
  • 1
    @res: ... no, UTF-8 is perfectly capable of encoding all *possible* characters in Unicode. `ă`, for example, is encoded as [0xC4 0x83](https://www.fileformat.info/info/unicode/char/0103/index.htm). It is far more likely that the font in use does not *have* those characters. – Jongware Mar 10 '20 at 09:28
  • 1
    @res actually this page itself is encoded in UTF-8 (at least for me) and I can see the symbols.... – user85421 Mar 10 '20 at 09:36
  • @aniri could you please share the HTML or abstract version of it so we could work on it? – shihabudheenk Mar 11 '20 at 02:38

0 Answers0