4

I have some problems unmarshalling XML files containing valid UTF-8 with some characters like & that are invalid in an XML context.

As the files comes from using the MetaData API of spotify, I have no means to make sure that they are correctly encoded.

Now I do know that I can parse the file and replace all those instances with & but as this problem is probably quite common, I wonder how one usually handle this? Is there some helper class in JAXB or otherwise that I should use, or do everyone write their own code to handle this problem?

nivis
  • 913
  • 3
  • 17
  • 34

3 Answers3

1

Have you tried CDATA? Take a look at this: http://www.w3schools.com/xml/xml_cdata.asp

mmdc
  • 1,677
  • 3
  • 20
  • 32
1

For your use case Spotify is returning invalid XML (at a minimum the & character is not escaped as &. Instead of jumping through hoops you may prefer to process the corresponding JSON data instead.

Many open source JSON-binding implementations exist (MOXy, Gson, Jackson, Genson, XStream, etc). Some of them allow you to provide mappings through JAXB metadata.

If you want to remain as close to your current setup as possible you could use a library like Jettison to convert the JSON to/from StAX events so tha it can be directly used by your JAXB implementation.

If you are using MOXy as your JAXB imp you simpy need to seta single property to enable JSON support (I'm the MOXy lead).

bdoughan
  • 147,609
  • 23
  • 300
  • 400
0

You need to deal with proper XML, which means no magic characters in tag values.

Your contract needs to be "good XML received, good XML sent."

Your clients have to encode and decode properly. You need to make sure that you do as well.

You need to decode (e.g. & to &) when you instantiate your objects. When you marshal XML, you have to encode properly.

I don't know of a magic fix. I'd advise you to get a shovel and start digging.

duffymo
  • 305,152
  • 44
  • 369
  • 561