1

I have a simple string with french accents. I am trying to save it to the pdf using ITextRenderer. Problem is that all the accents are deleted from the resulting pdf.

Input string to be saved is coming from my velocity template. There, i am doinf StringEscapeUtils.escape(StringEscapeUtils.unescape(stringWithAccents)) and this process is giving me my input string, like Supplément : Visa&Pourboires".

My code:

         String documentHtml = "Supplément : à&egrave"
         DocumentBuilder builder;
        try {
            DocumentBuilderFactory fac = DocumentBuilderFactory.newInstance();
            fac.setFeature("http://xml.org/sax/features/namespaces", false);
            fac.setFeature("http://xml.org/sax/features/validation", false);
            fac.setFeature("http://apache.org/xml/features/nonvalidating/load-dtd-grammar", false);
            fac.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
            builder = fac.newDocumentBuilder();
            byte[] docByte = documentHtml.getBytes("UTF-8");
            ByteArrayInputStream is = new ByteArrayInputStream(docByte);
            Document doc = builder.parse(is);
            is.close();
            File file = new File(this.getFolder(), this.getFileName());
            if (file.exists()) {
                file.delete();
            }

            // save pdf
            OutputStream os = new FileOutputStream(file);
            ITextRenderer renderer = new ITextRenderer();
            renderer.setDocument(doc, file.getParentFile().getAbsolutePath());
            renderer.layout();
            renderer.createPDF(os, true);
            os.close();


            return this.getFolder().getAbsolutePath() + "/" + this.getFileName();
        } catch (ParserConfigurationException e) {
            LOGGER.error("Error while parsing the configuration " + e.getMessage(), e);
            throw new BOServiceException("Error while parsing the configuration : " + e.getMessage(), e);
        } catch (UnsupportedEncodingException e) {
            LOGGER.error("Encoding error :  " + e.getMessage(), e);
            throw new BOServiceException("Encoding error : " + e.getMessage(), e);
        } catch (SAXException e) {
            LOGGER.error("Error in the document because of SAX :  " + e.getMessage(), e);
            throw new BOServiceException("Error in the document because of SAX :  " + e.getMessage(), e);
        } catch (IOException e) {
            LOGGER.error("Error due to io problem : " + e.getMessage(), e);
            throw new BOServiceException("Error due to io problem :" + e.getMessage(), e);
        }

So u have idea why my encoding is not working? Why in the result pdf I cannot see characters like à&egrave

gospodin
  • 1,133
  • 4
  • 22
  • 42
  • just read this http://stackoverflow.com/questions/1775008/embed-font-into-pdf-file-by-using-itext – user1516873 Oct 18 '12 at 09:40
  • I dont use any special fonts (no need for styled fonts). This must be encoding problem. My String is generated from velocity template and there is no additional styling of the fonts. – gospodin Oct 18 '12 at 09:55
  • Every font is 'styled'. Non-styled font is nonsense. If you don't specify a font explicity, itext set some default font. And if you don't embedd it, it's up to pdf reader how to show your pdf document. Generally it will use similar system font, and that font maybe don't have gliphs for `à` symbol. – user1516873 Oct 18 '12 at 12:31
  • yes, i was reading your link and article, but all the examples are with adding a paragraph or text with specific font. In my case font should be applied to my finalString in total. Do you have an idea how can I accomplish that? – gospodin Oct 18 '12 at 12:42
  • how to set one font to whole html document: http://stackoverflow.com/questions/12093236/how-to-get-rid-of-helvetica-in-itext-xmlworker/ – user1516873 Oct 18 '12 at 13:07
  • I checked that example but its very different since source is xml document so its using libraries like HtmlPipelineContext, Pipeline, XmlWorker and XmlParser. – gospodin Oct 19 '12 at 09:23

1 Answers1

1

Try changing the encoding from UTF-8 to ISO-8859-1.

Jean-Philippe Briend
  • 3,455
  • 29
  • 41
  • if i change getBytes("ISO-8859-1") im getting `code`ERROR [STDERR] [Fatal Error] :66:7: Invalid byte 2 of 3-byte UTF-8 sequence. 11:57:12,109 ERROR [PdfDocument] Error in the document because of SAX : Invalid byte 2 of 3-byte UTF-8 sequence. org.xml.sax.SAXParseException; lineNumber: 66; columnNumber: 7; Invalid byte 2 of 3-byte UTF-8 sequence. at org.apache.xerces.parsers.DOMParser.parse(Unknown Source) – gospodin Oct 18 '12 at 09:58
  • The String documentHtml also have to be in the charset you read it. For this String, it's the .java file's encoding which is used. How are your sources encoded ? Maybe change their encoding. – Jean-Philippe Briend Oct 18 '12 at 15:47
  • Here is an example of Encoding example in iText : http://itextpdf.com/examples/iia.php?id=198 – Jean-Philippe Briend Oct 18 '12 at 15:48
  • I was looking at that example but implementation is completely different from mine.. They are using paragraphs and Font class with font path in constructor ex. "c:/windows/fonts/arialbd.ttf" (we are going to run the application on different systems with different paths). Is it possible to avoid this? – gospodin Oct 19 '12 at 07:42