1

I'm trying to insert Arabic text into a pdf using pdfbox

File myFile = new File("src/arabic/arial.ttf");
PDFont font = PDType0Font.load(doc, myFile);
PDPageContentStream contentStream = new PDPageContentStream(doc, page,true,true);
contentStream.beginText();
contentStream.setFont(font, 12);
contentStream.newLineAtOffset(30, 40);
String arabicText = "عطي يونيكود رقما فريدا لكل حرف" ;
          // System.setProperty("ste.encoding", "UTF-8");
contentStream.showText(arabicText);
contentStream.endText();
contentStream.close();

The Arabic text appears as disconnected text in the resultant pdf.

  • Possible duplicate of [Writing Arabic Characters with PDFBox in their correct presentation form without being separated](https://stackoverflow.com/questions/48284888/writing-arabic-characters-with-pdfbox-in-their-correct-presentation-form-without) – mkl Jan 20 '18 at 22:42

1 Answers1

1

(This applies for PDFBox 2.0, not for earlier versions)

You have to do this yourself. I can't explain it for Arabic, but for "western" glyphs:

stream.showText("film \uFB01lm");

Create a PDF with that one, then try to mark the "f" or the "l" in the second word - you can't, because it is one entity.

The first word has "f" and "i" as separate characters, the second one has the latin small ligature fi (U+FB01). So you'd have to do some preprocessing yourself to replace such combinations when your font supports them. Good luck!

Tilman Hausherr
  • 17,731
  • 7
  • 58
  • 97