7

I need to store a huge amount of binary data into a file, but I want also to read/write the header of that file in XML format.

Yes, I could just store the binary data into some XML value and let it be serialized using base64 encoding. But this wouldn't be space-efficient.

Can I "mix" XML data and raw binary data in a more-or-less standardized way?

I was thinking about two options:

  • Is there a way to do this using JAXB?

  • Or is there a way to take some existing XML data and append binary data to it, in such a way that the the boundary is recognized?

  • Isn't the concept I'm looking for somehow used by / for SOAP?

  • Or is it used in the email standard? (Separation of binary attachments)

Scheme of what I'm trying to achieve:

[meta-info-about-boundary][XML-data][boundary][raw-binary-data]

Thank you!

java.is.for.desktop
  • 10,748
  • 12
  • 69
  • 103

4 Answers4

10

You can leverage AttachementMarshaller & AttachmentUnmarshaller for this. This is the bridge used by JAXB/JAX-WS to pass binary content as attachments. You can leverage this same mechanism to do what you want.


PROOF OF CONCEPT

Below is how it could be implemented. This should work with any JAXB impl (it works for me with EclipseLink JAXB (MOXy), and the reference implementation).

Message Format

[xml_length][xml][attach1_length][attach1]...[attachN_length][attachN]

Root

This is an object with multiple byte[] properties.

import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement
public class Root {

    private byte[] foo;
    private byte[] bar;

    public byte[] getFoo() {
        return foo;
    }

    public void setFoo(byte[] foo) {
        this.foo = foo;
    }

    public byte[] getBar() {
        return bar;
    }

    public void setBar(byte[] bar) {
        this.bar = bar;
    }

}

Demo

This class has is used to demonstrate how MessageWriter and MessageReader are used:

import java.io.FileInputStream;
import java.io.FileOutputStream;
import javax.xml.bind.JAXBContext;

public class Demo {

    public static void main(String[] args) throws Exception {
        JAXBContext jc = JAXBContext.newInstance(Root.class);

        Root root = new Root();
        root.setFoo("HELLO WORLD".getBytes());
        root.setBar("BAR".getBytes());

        MessageWriter writer = new MessageWriter(jc);
        FileOutputStream outStream = new FileOutputStream("file.xml");
        writer.write(root, outStream);
        outStream.close();

        MessageReader reader = new MessageReader(jc);
        FileInputStream inStream = new FileInputStream("file.xml");
        Root root2 = (Root) reader.read(inStream);
        inStream.close();

        System.out.println(new String(root2.getFoo()));
        System.out.println(new String(root2.getBar()));
    }

}

MessageWriter

Is responsible for writing the message to the desired format:

import java.io.ByteArrayOutputStream;
import java.io.ObjectOutputStream;
import java.io.OutputStream;
import java.util.ArrayList;
import java.util.List;

import javax.activation.DataHandler;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Marshaller;
import javax.xml.bind.attachment.AttachmentMarshaller;

public class MessageWriter {

    private JAXBContext jaxbContext;

    public MessageWriter(JAXBContext jaxbContext) {
        this.jaxbContext = jaxbContext;
    }

    /**
     * Write the message in the following format:
     * [xml_length][xml][attach1_length][attach1]...[attachN_length][attachN] 
     */
    public void write(Object object, OutputStream stream) {
        try {
            Marshaller marshaller = jaxbContext.createMarshaller();
            marshaller.setProperty(Marshaller.JAXB_FRAGMENT, true);
            BinaryAttachmentMarshaller attachmentMarshaller = new BinaryAttachmentMarshaller();
            marshaller.setAttachmentMarshaller(attachmentMarshaller);
            ByteArrayOutputStream xmlStream = new ByteArrayOutputStream();
            marshaller.marshal(object, xmlStream);
            byte[] xml = xmlStream.toByteArray();
            xmlStream.close();

            ObjectOutputStream messageStream = new ObjectOutputStream(stream);

            messageStream.write(xml.length); //[xml_length]
            messageStream.write(xml); // [xml]

            for(Attachment attachment : attachmentMarshaller.getAttachments()) {
                messageStream.write(attachment.getLength()); // [attachX_length]
                messageStream.write(attachment.getData(), attachment.getOffset(), attachment.getLength());  // [attachX]
            }

            messageStream.flush();
        } catch(Exception e) {
            throw new RuntimeException(e);
        }
    }

    private static class BinaryAttachmentMarshaller extends AttachmentMarshaller {

        private static final int THRESHOLD = 10;

        private List<Attachment> attachments = new ArrayList<Attachment>();

        public List<Attachment> getAttachments() {
            return attachments;
        }

        @Override
        public String addMtomAttachment(DataHandler data, String elementNamespace, String elementLocalName) {
            return null;
        }

        @Override
        public String addMtomAttachment(byte[] data, int offset, int length, String mimeType, String elementNamespace, String elementLocalName) {
            if(data.length < THRESHOLD) {
                return null;
            }
            int id = attachments.size() + 1;
            attachments.add(new Attachment(data, offset, length));
            return "cid:" + String.valueOf(id);
        }

        @Override
        public String addSwaRefAttachment(DataHandler data) {
            return null;
        }

        @Override
        public boolean isXOPPackage() {
            return true;
        }

    }

    public static class Attachment {

        private byte[] data;
        private int offset;
        private int length;

        public Attachment(byte[] data, int offset, int length) {
            this.data = data;
            this.offset = offset;
            this.length = length;
        }

        public byte[] getData() {
            return data;
        }

        public int getOffset() {
            return offset;
        }

        public int getLength() {
            return length;
        }

    }

}

MessageReader

Is responsible for reading the message:

import java.io.ByteArrayInputStream;
import java.io.IOException;
import java.io.InputStream;
import java.io.ObjectInputStream;
import java.io.OutputStream;
import java.util.HashMap;
import java.util.Map;

import javax.activation.DataHandler;
import javax.activation.DataSource;
import javax.xml.bind.JAXBContext;
import javax.xml.bind.Unmarshaller;
import javax.xml.bind.attachment.AttachmentUnmarshaller;

public class MessageReader {

    private JAXBContext jaxbContext;

    public MessageReader(JAXBContext jaxbContext) {
        this.jaxbContext = jaxbContext;
    }

    /**
     * Read the message from the following format:
     * [xml_length][xml][attach1_length][attach1]...[attachN_length][attachN] 
     */
    public Object read(InputStream stream) {
        try {
            ObjectInputStream inputStream = new ObjectInputStream(stream);
            int xmlLength = inputStream.read();  // [xml_length]

            byte[] xmlIn = new byte[xmlLength]; 
            inputStream.read(xmlIn);  // [xml]

            BinaryAttachmentUnmarshaller attachmentUnmarshaller = new BinaryAttachmentUnmarshaller();
            int id = 1;
            while(inputStream.available() > 0) {
                int length = inputStream.read();  // [attachX_length]
                byte[] data = new byte[length];  // [attachX]
                inputStream.read(data);
                attachmentUnmarshaller.getAttachments().put("cid:" + String.valueOf(id++), data);
            }

            Unmarshaller unmarshaller = jaxbContext.createUnmarshaller();
            unmarshaller.setAttachmentUnmarshaller(attachmentUnmarshaller);
            ByteArrayInputStream byteInputStream = new ByteArrayInputStream(xmlIn);
            Object object = unmarshaller.unmarshal(byteInputStream);
            byteInputStream.close();
            inputStream.close();
            return object;
        } catch(Exception e) {
            throw new RuntimeException(e);
        }
    }

    private static class BinaryAttachmentUnmarshaller extends AttachmentUnmarshaller {

        private Map<String, byte[]> attachments = new HashMap<String, byte[]>();

        public Map<String, byte[]> getAttachments() {
            return attachments;
        }

        @Override
        public DataHandler getAttachmentAsDataHandler(String cid) {
            byte[] bytes = attachments.get(cid);
            return new DataHandler(new ByteArrayDataSource(bytes));
        }

        @Override
        public byte[] getAttachmentAsByteArray(String cid) {
            return attachments.get(cid);
        }

        @Override
        public boolean isXOPPackage() {
            return true;
        }

    }

    private static class ByteArrayDataSource implements DataSource {

        private byte[] bytes;

        public ByteArrayDataSource(byte[] bytes) {
            this.bytes = bytes;
        }

        public String getContentType() {
            return  "application/octet-stream";
        }

        public InputStream getInputStream() throws IOException {
            return new ByteArrayInputStream(bytes);
        }

        public String getName() {
            return null;
        }

        public OutputStream getOutputStream() throws IOException {
            return null;
        }

    }

}

For More Information

bdoughan
  • 147,609
  • 23
  • 300
  • 400
  • Thank you! I'll have a look at this. – java.is.for.desktop Mar 15 '11 at 18:18
  • Well, these classes are very low-level. It seems that I still have to define handling of boundaries by myself. The only advantage it the compatibility with JAXB. – java.is.for.desktop Mar 15 '11 at 18:55
  • 1
    What they enable you to do is store the binary data outside the document and put a place holder with ID in the document. This is how JAX-WS (SOAP) uses JAXB. – bdoughan Mar 15 '11 at 19:09
  • In your example you would need to collect the byte[]s passed to the attachment marshaller, and then write them to the output stream after the marshal operation. – bdoughan Mar 15 '11 at 19:11
  • @java.is.for.desktop - I have updated my answer with a proof of concept for this approach. – bdoughan Mar 15 '11 at 20:34
  • @java.is.for.desktop - I have updated the code, it now works with both EclipseLink MOXy and the JAXB reference implementation. I also modified code to allow a threshold to be set. This will allow XML below the threshold to be inlined in the document as base64Binary. – bdoughan Mar 21 '11 at 20:46
  • @BlaiseDoughan: I have read many of your blogs. I am wondering if you could help me with some basic understanding in JAX-RS, posted here: http://stackoverflow.com/questions/25512502/what-is-the-use-of-persistentcontext-and-stateless-in-jax-rs – eagertoLearn Aug 26 '14 at 18:28
2

This is not natively supportted by JAXB as you do not want serialize the binary data to XML, but can usually be done in a higher level when using JAXB. The way I do this is with webservices (SOAP and REST) is using MIME multipart/mixed messages (check multipart specification). Initially designed for emails, works great to send xml with binary data and most webservice frameworks such as axis or jersey support it in an almost transparent way.

Here is an example of sending an object in XML together with a binary file with REST webservice using Jersey with the jersey-multipart extension.

XML object

@XmlRootElement
public class Book {
   private String title;
   private String author;
   private int year;

   //getter and setters...
}

Client

byte[] bin = some binary data...

Book b = new Book();
b.setAuthor("John");
b.setTitle("wild stuff");
b.setYear(2012);

MultiPart multiPart = new MultiPart();
    multiPart.bodyPart(new BodyPart(b, MediaType.APPLICATION_XML_TYPE));
    multiPart.bodyPart(new BodyPart(bin, MediaType.APPLICATION_OCTET_STREAM_TYPE));

    response = service.path("rest").path("multipart").
            type(MultiPartMediaTypes.MULTIPART_MIXED).
            post(ClientResponse.class, multiPart);

Server

@POST
@Consumes(MultiPartMediaTypes.MULTIPART_MIXED)
public Response post(MultiPart multiPart) {
    for(BodyPart part : multiPart.getBodyParts()) {
        System.out.println(part.getMediaType());
    }

    return Response.status(Response.Status.ACCEPTED).
            entity("Attachements processed successfully.").
            type(MediaType.TEXT_PLAIN).build();

}

I tried to send a file with 110917 bytes. Using wireshark, you can see that the data is sent directly over HTTP like this:

Hypertext Transfer Protocol
   POST /org.etics.test.rest.server/rest/multipart HTTP/1.1\r\n
   Content-Type: multipart/mixed; boundary=Boundary_1_353042220_1343207087422\r\n
   MIME-Version: 1.0\r\n
   User-Agent: Java/1.7.0_04\r\n
   Host: localhost:8080\r\n
   Accept: text/html, image/gif, image/jpeg\r\n
   Connection: keep-alive\r\n
   Content-Length: 111243\r\n
   \r\n
   [Full request URI: http://localhost:8080/org.etics.test.rest.server/rest/multipart]

   MIME Multipart Media Encapsulation, Type: multipart/mixed, Boundary: "Boundary_1_353042220_1343207087422"
     [Type: multipart/mixed]
     First boundary: --Boundary_1_353042220_1343207087422\r\n
        Encapsulated multipart part:  (application/xml)
        Content-Type: application/xml\r\n\r\n
        eXtensible Markup Language
          <?xml
          <book>
            <author>
              John
            </author>
            <title>
              wild stuff
            </title>
            <year>
              2012
            </year>
          </book>
     Boundary: \r\n--Boundary_1_353042220_1343207087422\r\n
        Encapsulated multipart part:  (application/octet-stream)
        Content-Type: application/octet-stream\r\n\r\n
        Media Type
          Media Type: application/octet-stream (110917 bytes)
     Last boundary: \r\n--Boundary_1_353042220_1343207087422--\r\n

As you see, binary data is sent has octet-stream, with no waste of space, contrarly to what happens when sending binary data inline in the xml. The is just the very low overhead MIME envelope. With SOAP the principle is the same (just that it will have the SOAP envelope).

Duarte Meneses
  • 2,868
  • 19
  • 22
2

I followed the concept suggested by Blaise Doughan, but without attachment marshallers:

I let an XmlAdapter convert a byte[] to a URI-reference and back, while references point to separate files, where raw data is stored. The XML file and all binary files are then put into a zip.

It is similar to the approach of OpenOffice and the ODF format, which in fact is a zip with few XMLs and binary files.

(In the example code, no actual binary files are written, and no zip is created.)

Bindings.java

import java.net.*;
import java.util.*;
import javax.xml.bind.annotation.*;
import javax.xml.bind.annotation.adapters.*;

final class Bindings {

  static final String SCHEME = "storage";
  static final Class<?>[] ALL_CLASSES = new Class<?>[]{
    Root.class, RawRef.class
  };

  static final class RawRepository
      extends XmlAdapter<URI, byte[]> {

    final SortedMap<String, byte[]> map = new TreeMap<>();
    final String host;
    private int lastID = 0;

    RawRepository(String host) {
      this.host = host;
    }

    @Override
    public byte[] unmarshal(URI o) {
      if (!SCHEME.equals(o.getScheme())) {
        throw new Error("scheme is: " + o.getScheme()
            + ", while expected was: " + SCHEME);
      } else if (!host.equals(o.getHost())) {
        throw new Error("host is: " + o.getHost()
            + ", while expected was: " + host);
      }

      String key = o.getPath();
      if (!map.containsKey(key)) {
        throw new Error("key not found: " + key);
      }

      byte[] ret = map.get(key);
      return Arrays.copyOf(ret, ret.length);
    }

    @Override
    public URI marshal(byte[] o) {
      ++lastID;
      String key = String.valueOf(lastID);
      map.put(key, Arrays.copyOf(o, o.length));

      try {
        return new URI(SCHEME, host, "/" + key, null);
      } catch (URISyntaxException ex) {
        throw new Error(ex);
      }
    }

  }

  @XmlRootElement
  @XmlType
  static final class Root {

    @XmlElement
    final List<RawRef> element = new LinkedList<>();
  }

  @XmlType
  static final class RawRef {

    @XmlJavaTypeAdapter(RawRepository.class)
    @XmlElement
    byte[] raw = null;
  }

}

Main.java

import java.io.*;
import javax.xml.bind.*;

public class _Run {

  public static void main(String[] args)
      throws Exception {
    JAXBContext context = JAXBContext.newInstance(Bindings.ALL_CLASSES);
    Marshaller marshaller = context.createMarshaller();
    marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true);
    Unmarshaller unmarshaller = context.createUnmarshaller();

    Bindings.RawRepository adapter = new Bindings.RawRepository("myZipVFS");
    marshaller.setAdapter(adapter);

    Bindings.RawRef ta1 = new Bindings.RawRef();
    ta1.raw = "THIS IS A STRING".getBytes();
    Bindings.RawRef ta2 = new Bindings.RawRef();
    ta2.raw = "THIS IS AN OTHER STRING".getBytes();

    Bindings.Root root = new Bindings.Root();
    root.element.add(ta1);
    root.element.add(ta2);

    StringWriter out = new StringWriter();
    marshaller.marshal(root, out);

    System.out.println(out.toString());
  }

}

Output

<root>
    <element>
        <raw>storage://myZipVFS/1</raw>
    </element>
    <element>
        <raw>storage://myZipVFS/2</raw>
    </element>
</root>
java.is.for.desktop
  • 10,748
  • 12
  • 69
  • 103
0

I don't think so -- XML libraries generally aren't designed to work with XML+extra-data.

But you might be able to get away with something as simple as a special stream wrapper -- it would expose an "XML"-containing stream and a binary stream (from the special "format"). Then JAXB (or whatever else XML library) could play with the "XML" stream and the binary stream is kept separate.

Also remember to take "binary" vs. "text" files into account.

Happy coding.

  • JAXB provides AttachmentMarshaller and AttachmentUnmarshaller for just this purpose: http://stackoverflow.com/questions/5315968/whats-the-most-standard-java-way-to-store-raw-binary-data-along-with-xml/5316087#5316087 – bdoughan Mar 15 '11 at 18:12
  • @Blaise Doughan I always thought those were for dealing with encoded MIME/base64 data. A +1 for you :) –  Mar 15 '11 at 18:13