2

I have a schema with an xs:any element. This element may contain other elements that have mixed content. I'm trying to use JAXB to unmarshall it into Java objects (with the 'any' as an Element).

From the schema:

<xs:element name="a">
    <xs:complexType>
        <xs:sequence>
            <xs:any processContents="lax"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>

In general, this works. But when handling elements with mixed content, whitespace between nested nodes is lost.

test.xml:

<a><foo><b>Hello</b> <i>World</i></foo></a>

Unmarshalling like this:

JAXBContext jc = JAXBContext.newInstance(A.class);
Unmarshaller unmarshaller = jc.createUnmarshaller();
InputStream inputStream = this.getClass().getResourceAsStream("/data/test.xml");
A a = (A) unmarshaller.unmarshal(inputStream);
Marshaller marshaller = jc.createMarshaller();
marshaller.marshal(a, System.out);

Results in this:

<a><foo><b>Hello</b><i>World</i></foo></a>

I lose the space between the child tags of the <foo> element. I'm certain that it's the unmarshal step that takes the whitespace out here, but I do need it to survive the round trip.

Note that it's only whitespace-only text content that's removed. This works as desired:

<a><foo><b>Hello</b> to you <i>World</i></foo></a>

I tried adding xml:space="preserve" (see, for example, JAXB: How to keep consecutive spaces as they are in source XML during unmarshalling), but that has no effect on whitespace between elements. I've tried with processContents set to each of strict, lax, and skip, none of which helped.

Community
  • 1
  • 1
paloma
  • 335
  • 2
  • 11
  • Hi were you able to solve this? – Alaf Azam Oct 04 '18 at 07:24
  • If I remember right, we had a workaround but not a real solution. I think we ended up having to pull out the 'any' element of the XML separately and manually replace it in the parent Java object. – paloma Oct 31 '18 at 13:16
  • We ended up using velocity templating, and it made things really easy. :) – Alaf Azam Nov 03 '18 at 11:22

1 Answers1

0

After facing a similar issue I could come up with the following solution (to this specific scenario, as for some other complex XML structures it doesn't work perfectly).

package com.stackoverflow.answers;

import java.io.ByteArrayInputStream;
import java.io.ByteArrayOutputStream;
import java.io.InputStream;
import java.nio.charset.StandardCharsets;
import java.util.List;

import javax.xml.bind.JAXBContext;
import javax.xml.bind.JAXBElement;
import javax.xml.bind.JAXBException;
import javax.xml.bind.Marshaller;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.annotation.XmlAccessType;
import javax.xml.bind.annotation.XmlAccessorType;
import javax.xml.bind.annotation.XmlAnyElement;
import javax.xml.bind.annotation.XmlMixed;
import javax.xml.bind.annotation.XmlRootElement;
import javax.xml.transform.stream.StreamSource;

import static org.junit.Assert.assertEquals;
import org.junit.Test;
import org.w3c.dom.Element;

public class XmlAnyElementWithWhiteSpacesTest {

    @XmlAccessorType(XmlAccessType.FIELD)
    @XmlRootElement(name = "a")
    private static class A {

        @XmlAnyElement
        @XmlMixed
        private List<Element> elements;
    }

    private static final String SAMPLE = "<a><foo><b>Hello</b> <i>World</i></foo></a>";

    @Test
    public void shouldParseAndSerializeKeepingWhiteSpaceElements() throws JAXBException {
        // given
        JAXBContext jc = JAXBContext.newInstance(A.class);
        Unmarshaller unmarshaller = jc.createUnmarshaller();
        InputStream inputStream = new ByteArrayInputStream(SAMPLE.getBytes(StandardCharsets.UTF_8));
        Marshaller marshaller = jc.createMarshaller();
        marshaller.setProperty(Marshaller.JAXB_FRAGMENT, Boolean.TRUE);
        // when
        JAXBElement<A> a = unmarshaller.unmarshal(new StreamSource(inputStream), A.class);
        ByteArrayOutputStream outputStream = new ByteArrayOutputStream();
        marshaller.marshal(a.getValue(), outputStream);
        String actual = new String(outputStream.toByteArray(), StandardCharsets.UTF_8);
        // then
        assertEquals(SAMPLE, actual);
    }

}

The key points here are:

  • Usage of @XmlMixed annotation
  • Usage of StreamSource

You can use either List<Object> or List<Element> for your "XML any content" property.