1

I'm trying to convert *.xhtml with Hebrew characters (UTF-8) to PDF by using iText library but I getting all letter in reverse order. As far I understand from this question I can set RTL only for ColumnText and PdfCell objects:

Arabic (and Hebrew) can only be rendered correctly in the context of ColumnText and PdfPCell.

So I doubt is it possible to convert whole *.xhtml page to PDF?

This is an *.xhtml file which I try to import:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">

<head>
  <title>Title of document</title>
</head>

<body style="font-size:12.0pt; font-family:Arial">
  שלום עולם
</body>

</html>

And this is Java code which I use:

public static void convert() throws Exception{
            Document document = new Document();
            PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream("import.pdf"));
            writer.setRunDirection(PdfWriter.RUN_DIRECTION_RTL);
            document.open();

            String str = null;
            BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream("import.xhtml"), "UTF8"));
            StringBuilder sb = new StringBuilder();

            while ((str = in.readLine()) != null) {
               System.out.println(str);
                sb.append(str);
            }
            in.close();


            XMLWorkerHelper worker = XMLWorkerHelper.getInstance();

            InputStream is = new ByteArrayInputStream(sb.toString().getBytes(StandardCharsets.UTF_8));
            worker.parseXHtml(writer, document, is, Charset.forName("UTF-8"));

            document.close();
        }
    }

This is what I get until now:

And this is result which I get


Thank you for any help.

Community
  • 1
  • 1
Anatoly
  • 5,056
  • 9
  • 62
  • 136
  • I can't read Hebrew, so forgive me my confusion, but I see "שלום עולם" in your code and I see the same glyphs written from right to left in your screen shot. Doesn't this mean that iText changed the orientation of the text? – Bruno Lowagie Jun 15 '15 at 15:08
  • @BrunoLowagie, they are same glyphs but opposite direction. In the code "ש" is the most rightest glyph while at the picture "ש" is the most leftest glyph. – Anatoly Jun 15 '15 at 15:12
  • OK, I'll experiment. In HTML to PDF, you can have something like this: `
    שלום עולם
    `, but that should switch the order of the glyphs, so that `ש` is to the left, shouldn't it? I'll test.
    – Bruno Lowagie Jun 15 '15 at 15:13

1 Answers1

2

Please take a look at the ParseHtml10 example. In this example, we have take the file hebrew.html:

<html>

<head>
  <title>Hebrew text</title>
</head>

<body style="font-size:12.0pt; font-family:Arial">
<div dir="rtl" style="font-family: Noto Sans Hebrew">שלום עולם</div>
</body>

</html>

And we convert it to PDF using this code:

public void createPdf(String file) throws IOException, DocumentException {
    // step 1
    Document document = new Document();
    // step 2
    PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(file));
    // step 3
    document.open();
    // step 4
    // Styles
    CSSResolver cssResolver = new StyleAttrCSSResolver();
    XMLWorkerFontProvider fontProvider = new XMLWorkerFontProvider(XMLWorkerFontProvider.DONTLOOKFORFONTS);
    fontProvider.register("resources/fonts/NotoSansHebrew-Regular.ttf");
    CssAppliers cssAppliers = new CssAppliersImpl(fontProvider);
    HtmlPipelineContext htmlContext = new HtmlPipelineContext(cssAppliers);
    htmlContext.setTagFactory(Tags.getHtmlTagProcessorFactory());

    // Pipelines
    PdfWriterPipeline pdf = new PdfWriterPipeline(document, writer);
    HtmlPipeline html = new HtmlPipeline(htmlContext, pdf);
    CssResolverPipeline css = new CssResolverPipeline(cssResolver, html);

    // XML Worker
    XMLWorker worker = new XMLWorker(css, true);
    XMLParser p = new XMLParser(worker);
    p.parse(new FileInputStream(HTML), Charset.forName("UTF-8"));;
    // step 5
    document.close();
}

The result looks like hebrew.pdf:

enter image description here

What are the hurdles you need to take?

  • You need to wrap your text in an element such as a <div> or a <td>.
  • You need to add the attribute dir="rtl" to define the direction.
  • You need to make sure that you're using a font that knows how to display Hebrew. I used a NOTO font for Hebrew. This is one of the fonts distributed by Google in their program to provide fonts for every possible language.

I can't read Hebrew, but I hope the resulting PDF is correct and that this solves your problem.

Important: this solution requires at least iText and XML Worker 5.5.5, because support for the dir attribute was introduced in 5.5.4 and improved in 5.5.5.

Bruno Lowagie
  • 75,994
  • 9
  • 109
  • 165
  • I'm sorry but it definitely looks correct but code doesn't work. My guilty that I didn't check the code yesterday but I make an assumption that if it works for you it shouldn't be any problem to implement it once again, I was wrong. Right now I've tried to implement your solution but, actually I've copied your *.xhtml example and your java code and it doesn't work. It displays once again wrong direction. – Anatoly Jun 16 '15 at 07:11
  • Which version of iText / XML Worker are you using? – Bruno Lowagie Jun 16 '15 at 07:40
  • Can you explain please location of `resources/fonts/NotoSansHebrew-Regular.ttf`? Right now I created folder which called `resources/fonts...` at the root level of java project and add it to build path. But when I change the name of font file in a code it doesn't throw an exception, so I've no indication if the font was registered or wasn't. – Anatoly Jun 16 '15 at 07:42
  • See [the sandbox on github](https://github.com/itext/sandbox) for the structure of the examples and their resources. Your example wasn't released yet, but you can find the fonts [here](https://github.com/itext/sandbox/tree/master/resources/fonts). They are used in examples such as [FontTest.java](https://github.com/itext/sandbox/blob/master/src/main/java/sandbox/fonts/FontTest.java). – Bruno Lowagie Jun 16 '15 at 07:46
  • 1
    Your versions are incompatible. You always need to use the same version of iText and XML Worker. I don't think XML Worker 5.4.1 already knows about the `dir` attribute. – Bruno Lowagie Jun 16 '15 at 07:47
  • You were right, I've changed the version of XML Worker to 5.5.6 ant it began to work, thank you. I think it should be mentioned at your answer. – Anatoly Jun 16 '15 at 07:52