0

I'm having an issue unmarshaling an XML file when some special characters as "/" are included inside one attribute's value like this one:

<field name = "test" value = "test&/"/>

I'm using the libraries woodstox-core (v5.0.3) and stax2-api (3.1.4)

The attribute value is defined in the XSD as a normalized String, that I think allows the character "/":

<xs:element name="field" maxOccurs="unbounded">
    <xs:complexType>
        <xs:attribute name="name" type="xs:token" use="required" />
        <xs:attribute name="value" type="xs:normalizedString" use="required" />
    </xs:complexType>
</xs:element>

But when making the unmarshal call, the exception is thrown:

XMLStreamReader xsr = null;
try {
    // Create the XML stream reader
    XMLInputFactory xif = XMLInputFactory.newFactory();
    xsr = xif.createXMLStreamReader(inputStream, "UTF-8");

    // Unmarshall the XML with JAXB, with XML schema validation enabled
    JAXBContext jc = JAXBContext.newInstance(Root.class);
    Unmarshaller unmarshaller = jc.createUnmarshaller();
    unmarshaller.setSchema(this.xmlSchema);
    Root rootIndex = (Root) unmarshaller.unmarshal(xsr);
    [...]
}

And here the exception:

Caused by: com.ctc.wstx.exc.WstxUnexpectedCharException: Unexpected character '/' (code 47) (expected a name start character)
 at [row,col {unknown-source}]: [17,74]
    at com.ctc.wstx.sr.StreamScanner.throwUnexpectedChar(StreamScanner.java:653) [woodstox-core-5.0.3.jar:5.0.3]
    at com.ctc.wstx.sr.StreamScanner.parseFullName(StreamScanner.java:1933) [woodstox-core-5.0.3.jar:5.0.3]
    at com.ctc.wstx.sr.StreamScanner.parseEntityName(StreamScanner.java:2058) [woodstox-core-5.0.3.jar:5.0.3]
    at com.ctc.wstx.sr.StreamScanner.fullyResolveEntity(StreamScanner.java:1525) [woodstox-core-5.0.3.jar:5.0.3]
    at com.ctc.wstx.sr.BasicStreamReader.parseAttrValue(BasicStreamReader.java:2017) [woodstox-core-5.0.3.jar:5.0.3]
    at com.ctc.wstx.sr.BasicStreamReader.handleNsAttrs(BasicStreamReader.java:3145) [woodstox-core-5.0.3.jar:5.0.3]
    at com.ctc.wstx.sr.BasicStreamReader.handleStartElem(BasicStreamReader.java:3043) [woodstox-core-5.0.3.jar:5.0.3]
    at com.ctc.wstx.sr.BasicStreamReader.nextFromTree(BasicStreamReader.java:2919) [woodstox-core-5.0.3.jar:5.0.3]
    at com.ctc.wstx.sr.BasicStreamReader.next(BasicStreamReader.java:1123) [woodstox-core-5.0.3.jar:5.0.3]
    at com.sun.xml.bind.v2.runtime.unmarshaller.StAXStreamConnector.bridge(StAXStreamConnector.java:197) [jaxb-impl-2.2.3-1.jar:2.2.3]
    at com.sun.xml.bind.v2.runtime.unmarshaller.UnmarshallerImpl.unmarshal0(UnmarshallerImpl.java:366) [jaxb-impl-2.2.3-1.jar:2.2.3]
    ... 16 more

Is there anything else I need to defined to accept those characters (apart of UTF-8) or is it simply not allowed?

Many thanks in advance!

Daniel Rodríguez
  • 548
  • 1
  • 10
  • 30
  • You can remove all these characters (only / but not /> or >) from your XML before trying to unmarshall. This preprocessing should be fairly simple. – dsp_user Sep 06 '17 at 09:57
  • @dsp_user but I need this character. The / is expected as a possible value – Daniel Rodríguez Sep 06 '17 at 10:01
  • The XML is still malformatted. / should be escaped in your XML. – dsp_user Sep 06 '17 at 10:03
  • @dsp_user But I thought that the only 5 characters that need to be escaped in XML are those: " " ' ' < < > > & & https://stackoverflow.com/a/1091953/285608 – Daniel Rodríguez Sep 06 '17 at 10:05
  • It seems this is a JAXB limitation (or perhaps STAX), not an XML one. If you have control over your XMLs then replace / with another character (one that is normally never used). You can then use an XmlAdapter to recover / . – dsp_user Sep 06 '17 at 10:13
  • @dsp_user sorry, it was my bad. The error was not the / itself, but an & just before it that was not escaped. Discussing with you about the 5 characters that need to be escaped, helped me to realize the issue. May thanks for your time! – Daniel Rodríguez Sep 06 '17 at 10:26
  • 1
    It's ok, we all make mistakes. – dsp_user Sep 06 '17 at 10:30

1 Answers1

0

The issue here was not really the / character, but the & before it. / is ok by itself, but & needs to be escaped. I was too focused on the / due to the error message.

Escaping the & like that fixed the issue:

<field name = "test" value = "test&amp;/"/>
Daniel Rodríguez
  • 548
  • 1
  • 10
  • 30