2

Am using itext to create pdf from html content. I build html content in the form of table using java String buffer. A Map contains metadata values of the files in the form of key value pairs. I iterate these key and values to build the html table. The problem is some of the metadata values in map are meaningless/invalid symbols. So pdf creation fails with following exception.

java.io.IOException: Expected > for tag: <{1}/> near line 1, column 717
at com.lowagie.text.xml.simpleparser.SimpleXMLParser.throwException(SimpleXMLParser.java:568)
    at com.lowagie.text.xml.simpleparser.SimpleXMLParser.go(SimpleXMLParser.java:331)
    at com.lowagie.text.xml.simpleparser.SimpleXMLParser.parse(SimpleXMLParser.java:579)
    at com.lowagie.text.html.simpleparser.HTMLWorker.parse(HTMLWorker.java:141)


Content which caused the exception is 
“$é6莚ÆuCÅ ©À SÀF;r 1Ì/XQ‡,Ô<ÒÐ"‡(¢ËÄòÅ1¡Ø€ÌÅc

So my question is what are these characters(Non-Ascii,utf-unsupported)? Is there any way to identify and skip them while building html?

Vijay
  • 415
  • 5
  • 18
  • The only bad character is the `<` here, which should not appear in your HTML. Converting it to its proper escaped form `<` should fix it. – Jongware Sep 26 '14 at 09:30
  • @Jongware: am escaping all possible html characters. After escaping the content is "“$é6莚ÆuCÅ ©À SÀF;r 1Ì/XQ‡,Ô<ÒÐ"‡(¢ËÄòÅ1¡Ø€ÌÅc" Even then it fails.. – Vijay Sep 26 '14 at 09:38
  • "It fails" is **not** a helpful description of the problem. Your original error was `Expected > for tag`, surely you must be getting a new error message? – Jongware Sep 26 '14 at 09:51

1 Answers1

2

In real time it is difficult to identify and skip while building HTML You can use Apache commons-lang to escape HTML

StringEscapeUtils.escapeHtml("“$é6莚ÆuCÅ ©À SÀF;r 1Ì/XQ‡,Ô<ÒÐ"‡(¢ËÄòÅ1¡Ø€ÌÅc")

The output of the above is

&ldquo;$&eacute;6&egrave;&#381;&scaron;&AElig;uC&Aring; &copy;&Agrave; S&Agrave;F;r 1&Igrave;/XQ&Dagger;,&Ocirc;&lt;&Ograve;&ETH;&quot;&Dagger;(&cent;&Euml;&Auml;&ograve;&Aring;1&iexcl;&Oslash;&euro;&Igrave;&Aring;c
ashokramcse
  • 2,841
  • 2
  • 19
  • 41