15

I have tried iText, PDFBox & Oracle Forms. And I also succed in case of iText to generate Gujarati PDF Document. But, unfortunately it is not generating proper Font in Gujarati (UTF-8) language.

I have my project in jdk 1.4 & that is mandatory to use. So, I need older version of API that support Gujarati Font.

Please suggest if any option is available.

Sample Code:

public void GeneratePDFusingiText(String lStrGujaratidata)
  {
    try
    {

      BaseFont bf = BaseFont.createFont("C:\\Windows\\Fonts\\Shruti.ttf",  BaseFont.IDENTITY_H, BaseFont.NOT_EMBEDDED);
      Font font = new Font(bf, 12);
      Document document = new Document();
      PdfWriter.getInstance(document, new FileOutputStream("D:/GeneratePDFusingiText.pdf"));
      document.open();
      document.add(new Paragraph(lStrGujaratidata, font));
      document.close();
    }
    catch(Exception e)
    {
      System.out.println("Exception while generating PDF");
      e.printStackTrace();
    }
   } 

EDIT 1:

Perhaps the image is not getting displayed. It is uploaded here.

EDIT 2:

image of font examples

Step-1) I type a gujarati String Google Transliterate.

Step-2) I convert it into unicode using BableMap Software to use it using Resourse Bundle.

Issue: Let me have a String: બિલાડી (Biladi)

It's unicode will be : \u0AAC \u0ABF\u0AB2\u0ABE\u0AA1\u0AC0

Check the Bold Unicode character above. That is where I am getting the problem. Now if I change this unicode to \u0ABF\u0AAC\u0AB2\u0ABE\u0AA1\u0AC0 , it prints proper output in PDF.

At the same time it prints wrong output in HTML i.e. : િબલાડી

I have to manage in between them.

I have tried using "gu" & "gu.UTF-8" & "UTF-8". But, everytime I am getting same output.

Sarang
  • 339
  • 2
  • 6
  • 16
  • I'm not an expert here, but I'd say the most critical thing would be the fonts - which ones are you using, and what format are they (TrueType etc)? Would you give a screenshot example of what output you are currently getting? – halfer May 08 '12 at 13:40
  • I am using Shruti.ttf (Gujarati) font. I am editting question for further information. – Sarang May 09 '12 at 05:12
  • Hi Sarang, are you restricted to use particular reporting tool.?? If not so, I have used gujarati fonts with jasper reports in case you can use jasper reports and need help let me know.! – Anuj Patel May 09 '12 at 10:22
  • I'm not a Java programmer, but I'd look at the locale if I were you. You've set it to 'gu' - have you tried UTF-8? – halfer May 09 '12 at 11:07
  • @indyaah: I have seen Jasper Report tool. But, internally it is using iText itself. – Sarang May 09 '12 at 11:40
  • @halfer: Please see the next Edit in the question. – Sarang May 09 '12 at 11:49
  • Can you provide the TTF file? I don't see a good, trustworthy spot online to download it from – josh.trow May 09 '12 at 12:11
  • I'd be inclined to ensure the input is correct - if your input is rendering incorrectly in HTML, then I wouldn't expect it to render correctly in the PDF. Perhaps you need to look at the character set in your HTML - again it should be UTF-8. Other than that, I'm out of ideas - but best of luck! – halfer May 09 '12 at 12:23
  • @Sarang: Have you tried PD4ML? We use it for our company product, it seems to work for everything we do. http://pd4ml.com/index.htm – josh.trow May 09 '12 at 13:28
  • @All: I think one more option would be to use HTML to PDF convertor. As of now, I have already been able to generate proper HTML. I want directly to generate PDF. But, it doesn't seem possible until we have Indian Language Supportive API. In that case, let me try HTML to PDF Convertor available if any. Suggest me free API available. – Sarang May 10 '12 at 04:49
  • @josh.trow: I am trying free demo version for the same. Let me check the outcome. – Sarang May 10 '12 at 04:50
  • @sarang - you could try Docmosis. It uses UTF-8 and the forums indicate it has worked for Turkish fonts. – Paul Jowett May 10 '12 at 08:36
  • @All: Any solution with this : http://support.itextpdf.com/node/83 – Sarang May 10 '12 at 09:26
  • why dont you try creating a word doc with the gujarati content and then convert it to PDF? That should be easier. – mavrav May 10 '12 at 10:00
  • @mavrav: Probably that can also be an option. Let me try searching for free Java API for the same. – Sarang May 11 '12 at 04:44
  • Are you trying to create a template in gujarati font and then fill it up with values? – mavrav May 11 '12 at 08:28
  • Nope. I am generating whole book in Gujarati, which will be converted to PDF file later using Convertor Softwares. I want directly to generate in PDF. – Sarang May 11 '12 at 08:33
  • You can find the similar question here - http://stackoverflow.com/questions/2109392/using-unicode-charater-in-generated-pdf-java-itext – Bhavik Ambani May 12 '12 at 13:15
  • @Sarang : What was the final solution for this problem ? – IT ppl Mar 24 '14 at 10:59
  • I'm facing a problem with Gujarati fonts, have anyone solved it?! – Jay Patel Nov 20 '17 at 13:25
  • @sarang: getting any solution ? – Pankaj Talaviya Jun 09 '23 at 02:05

1 Answers1

1

Updated Answer

After your comment I realised that I was wrong, i.e. the diacritic character should appear second in the byte sequence, even though it should be rendered left of the main character.

So, it turns out, iText doesn't support this type of rendering on Indic charactersets. Roughly speaking, iText uses awt's Graphics2D to render non-Latin unicode characters, one-by-one, as images in the PDF. (I guess this is because appropriate fonts are not necessarily be installed on everyone's computer). This feature doesn't take this special ordering into account.

iText does support similar behaviour for Arabic, using a class contributed by another developer. See com.itextpdf.text.pdf.ArabicLigaturizer. Perhaps you could create a similar one yourself? (!)

It looks like this has come up before:

Original Answer

Kem chho,

I believe that iText is displaying the correct characters, but the first 2 characters of your input have been 'flipped' before you translated the string into unicode points. So, the problem occurred before the data even gets to iText.

The underlying issue is that the 'first' character is a 'pre-base' character, which is a type of Diacritic. It's a bit like an 'accent' in European texts, in that it can't exist on its own, and its purpose is to embellish another character. In this case it turns a 'Ba' (બ) into a 'Bi'.

You'll see int the the Unicode Codepage, that the first character (િ) is indeed codepoint \u0ABF, and the second (બ) is \u0AAC : http://en.wikipedia.org/wiki/Gujar%C4%81ti_script#Unicode

So, somewhere between Google Transliterate and your codepoint representation, these characters got flipped. So, you need to review how you did that translation.

How did you convert these characters into codepoints?

Seemingly, some interpreters place the 'pre-base' after the main consonant, instead of before it:

  • Note that when you paste those characters into a (Linux) terminal, the first 2 characters come out back-to-front. I believe something similar happened for you too.
  • You'll also notice that when you try editing this word in Google Transliterate, you can't place the cursor between the first 2 characters, and when you hit backspace, the left character is deleted before the right.

So, if you can work out where this 'flipping' occured, then hopefully your solution will present itself.

Hope this helps

laher
  • 8,860
  • 3
  • 29
  • 39
  • This character are flipped of course... But they are fine with HTML & not fine with PDF. That is why I have to manage in between :) – Sarang May 14 '12 at 10:37