0

A third party system is sending an XML file having '&' character in data. They are not even using CDATA. (Poorly designed system it is).

In Java, how to read that XML file and escape the & and other special characters (<, >, ", ')?

I know this question has been asked many times, but here we don't have any control over the third party system. So how we can read that "invalid" XML file and make it a valid one?

** I'm not able to use SAX/DOM parser as it considers the input file an invalid one.

Abhishek
  • 2,095
  • 2
  • 21
  • 25
  • How are you loading the XML into Java? What is currently failing? You could use the commons-lang XML Escape utils but it depends what you're doing currently. Some example code would be good! Edit: XML 1.0 or 1.1 specification? – Daniel Tung May 07 '15 at 13:20
  • Loading the file as InputStream, but whenever trying to parse it, its throwing an exception as it does not consider the input file a valid XML. – Abhishek May 07 '15 at 13:23
  • If you can read the XML in as a string you can escape it using string functions. You can then parse the String back into XML with the escaped characters. Your supplier needs to update their system as it's currently not valid XML, at least tell them this too. Check out https://commons.apache.org/proper/commons-lang/javadocs/api-3.4/org/apache/commons/lang3/StringEscapeUtils.html#escapeXml11(java.lang.String) – Daniel Tung May 07 '15 at 13:24
  • @DanielTung, thanks for your input but I have checked this. How to apply the function? Suppose, the input XML text is Here is some "Text" that I'd like to be "escaped" for XML & here is some Swedish: Tack. Vars?god. After applying the StringEscapeUtils it would become,<sometext> Here is some "Text" that I'd like to be "escaped" for XML & here is some Swedish: Tack. Varsågod. </sometext> It will destroy all the XML tags. – Abhishek May 07 '15 at 13:26
  • If you're reading the file line by line you could use regex? Get the string between the first > and the last < and convert that, then rewrite the string back to be Here is some "Text" that I'd like to be "escaped" for XML & here is some Swedish: Tack. Varsågod. – Daniel Tung May 07 '15 at 13:29
  • Could you please provide some example using regex? – Abhishek May 07 '15 at 13:30

1 Answers1

0

Treat is as XPL. XPL is structured exactly like XML but allows the "special characters" in text fields. The XPL to XML conversion utilities will do exactly what you need. http://hll.nu

Roger F. Gay
  • 1,830
  • 2
  • 20
  • 25