13

W3's EXI (efficient XML interchange) is going to be standardized. It claims to be "the last binary standard".

It is a standard to store XML data optimized for processing and storage, is bundled with XML schema (making the data strongly typed and strongly structured). Well, there are a lot of claimed advantages. I was impressed most by the processing and memory-efficiency measurements.

I am asking myself, what is going to happen to all the established XML APIs?

There is this paragraph related to my question:

4.2 Existing XML Processing APIs

As EXI is an encoding of the XML Infoset, an EXI implementation can support any of the commonly-used XML APIs for XML processing, so EXI has no immediate impact on existing XML APIs. However, using an existing XML API also requires that all names and text appearing in the EXI document be converted into strings. In the future, more efficiency might be achievable if the higher layers could directly use these data as typed values appearing in the EXI document. For instance, if a higher layer needs typed data, going through its string form can produce a performance penalty, so an extended API that supports typed data directly could improve performance when used with EXI.

from: http://www.w3.org/TR/exi-impacts/

I understand it as following: "Using EXI with existing APIs? No performance gain! (Unless you rewrite them all)"

Let's take the Java ecosystem as an example:

We have plenty of XML APIs in latest JDK 6 (With each major JDK release, more and more of them were added.) As far as I can judge, most (if not all) of them are using either in-memory DOM trees, or serialized ("textual") representation to transform/process/validate/... XML data.

What do you guys think, what is going to happen to these APIs with introduction of EXI?

Thank you all for your opinions.

For those who don't know EXI: http://www.w3.org/XML/EXI/

Community
  • 1
  • 1
ivan_ivanovich_ivanoff
  • 19,113
  • 27
  • 81
  • 100

5 Answers5

5

You don't need any new APIs to get the performance gains of EXI. All the EXI testing and performance measurements the W3C has conducted use the standard SAX APIs built into the JDK. For the latest tests, see http://www.w3.org/TR/exi-evaluation/#processing-results. EXI parsing was on average 14.5 times faster than XML in these tests without any special APIs.

One day, if people think its worthwhile, we may see some typed XML APIs emerge. If and when that happens, you will get even better performance from EXI. However, this is not required to get excellent performance like that reported by the W3C.

Alexander Farber
  • 21,519
  • 75
  • 241
  • 416
John Schneider
  • 164
  • 1
  • 1
4

Let's see EXI as a "better GZIP for XML". FYI, it has no impact on the APIs as you can still used all of them (DOM, SAX, StAX, JAXB ...). Only that in order to get EXI you have to get a streamwriter that writes to it or a streamreader that reads it.

The most efficient way to perform EXI is StAX. But it is true that new API might arise because of EXI. But who said DOM is efficient and well designed for modern languages ;-)

If you are handling big XML files (I got some of them that are few hundreds of MB), you definitively knows why you need EXI : saving tons of space, saving huge amount of memory and processing time.

This is nothing different than HTTP Content-Encoding purpose : you are not required to use it, simply that if both parties understand it, it is a much efficient way to perform the exchange.

By the way, EXI will become the prefered way to content-encore any XML over HTTP IMHO because of SOAP bloat ;-) As soon as EXI settle on the browsers, it could also benefit any enduser : faster transfert, faster analysis = best experience ever for same machine!

EXI does not deprecate string representation, only makes it a bit different. Oh and by the way, when doing UTF (think default UTF8 for instance), you are already using a "compression encoding" for the 32bits unicode code point ... this means, that on the wire data is not the same as real data already ;-)

Alexander Farber
  • 21,519
  • 75
  • 241
  • 416
3

I'm dealing with EXI right now.

There's no good universal tool for processing EXI. Once you get into the guts of EXI, you realize there is a bunch of needless delimiters in the binary stream which are absolutely and completely unnecessary with a schema. Some of it is humorous.

How would you think the following would be encoded in EXI if both values are specified?

<xs:complexType name="example">
  <xs:sequence>
    <xs:element name="bool1" type="xs:boolean" minOccurs="0" />
    <xs:element name="bool2" type="xs:boolean" minOccurs="0" />
  </xs:sequence>
</xs:complexType>

Would you think it might be maximum 4 bits? 1 bit to indicate if bool1 is defined, and that the value of bool1, followed by another bit to indicate if bool2 is defined, then the value of bool2?

Good golly no!

Well let me tell you boys and girls! This is how it's actually encoded

+---- A value of 0 means this element (bool1) is not specified,
|       1 indicates it is specified
|+--- A value of x means this element is undefined,
||      0 means the bool is set to false, 1 is set to true
||+-- A value of 0 means this element (bool2) is not specified,
|||     1 indicates it is specified
|||+- A value of x means this element is undefined
||||    0 means the bool is set to false, 1 is set to true
||||
0x0x  4 0100           # neither bools are specified
0x10  8 00100000       # bool1 is not specified, bool2 is set to false
0x11  8 00101000       # bool1 is not specified, bool2 is set to true
100x  9 000000010      # bool1 is set to false, bool2 is not specified
110x  9 000010010      # bool1 is set to true, bool2 is not specified

1010 13 0000000000000  # bool1 is set to false, bool2 is set to false
1011 13 0000000001000  # bool1 is set to false, bool2 is set to true
1110 13 0000100000000  # bool1 is set to true, bool2 is set to false
1111 13 0000100001000  # bool1 is set to true, bool2 is set to true
        ^           ^
        +-encoding--+

Which can be represented with this tree

  0-0-0-0-0-0-0-0-0-0-0-0-0 (1010)
   \ \   \     \   \
    | |   |     |   1-0-0-0 (1011)
    | |   |     |
    | |   |     1-0 (100x)
    | |   |
    | |   1-0-0-0-0-0-0-0-0 (1110)
    | |        \   \
    | |         |   1-0-0-0 (1111)
    | |         |
    | |         1-0 (110x)
    | |
    | 1-0-0-0-0-0 (0x10) 
    |    \
    |     1-0-0-0 (0x11)
    |
    1-0-0 (0x0x)

A minimum of 4 bits, MINIMUM in order not to define either. Now I'm being a little unfair, because I'm including delimiters - delimiters which are entirely unnecessary.

I understand how this works, now. Here's the spec:

https://www.w3.org/TR/exi/

Have fun reading that! It was a GREAT DEAL OF FUN FOR ME!!!!@@##!@

Now this is just with a schema, and the EXI spec specifically says that you can still encode XML that does NOT conform with a schema. Which is hilarious because this is supposed to be for small little web devices. What do you do with unexpected data that you have no provisions for handling in an embedded device?

Why, you just die of course. There's no recovery for something you don't expect. It's not like these things have a screen, I'm lucky if I can log into it through a serial port.

I have used 4 different XSD generators/parsers/XML generators. 3 of them choke on the Schema I have to use. Data marshaling for C and C++ (remember this is for EMBEDDED system with very little memory and CPU power) are awful.

XSD describes basically a structure or class architecture and there isn't a single tool I can find that will just create the classes. The XSD example I gave above should create a structure with a 4 bools, 2 bools are the values, and 2 bools indicate if they even are defined.

But does THAT exist? Well heck no.

I like XML, for describing documents. Really I do - but here is what I hate about XML - for a widely adopted standard, the available tools for it are absolutely terrible. Just reading a schema is a difficult thing to do when it's spread across multiple namespaces and documents.

Rant rant, huff huf

The only reason we are using this is some standards committee insisted upon it. What it's done is created a monopoly for a small group of companies that already implemented this, that's the only purpose.

EXI is not a widely adopted standard, XML is a poor encapsulator for numeric data, and it's a pain to implement it and there are no decent tools for it. EXIP is at version 5.0 - anything that works that is open source is in Java - at least I have that.

For my field of work, EXI is just a bad design decision. I've worked on tons of communications protocols on various embedded systems. I worked on DOCSIS, which all modern cable modems use - they use a simple, and extensible, Type/Length/Value protocol with provisions for dealing with unrecognized types - which is why the Length is always included. It's simple, it takes literally days to implement the entire stack.

EXI is very difficult to hand code, there are no decent processors for it, and worst of all, all the processors I have found that actually work well with it, just transform it from EXI<->XML - which is totally useless.

I have resorted to writing my own XSD parser, which means I have to understand at least the entire XML specification for those parts of this design that use it - and that's extensive. What would have taken me 2 weeks to do with any reasonable spec, took me 10. Nobody in my world is going to use this unless it's shoved down their throat and they shouldn't, it's a square peg for a round hole.

user6269400
  • 129
  • 1
  • 2
  • Over 10 years since the original EXI spec was released and this question was asked and EXI hasn't caught on at all, which is a bit of a shame because I too have to work with 100+ MB XML files and having something that's a little easier on storage space and memory would definitely be nice. I needed to read an answer like that to confirm I'm not crazy about EXI support anywhere being non-existent. – VLRoyrenn Jun 01 '20 at 16:12
2

The problem with EXI is that it needs to be abstracted from your application code. I work on a middleware product where the human readable nature of XML is key in certain aspects (logging, fault finding, etc.) but can be sacrificed in other areas (communication between internal applications to limit I/O load).

We currently use SOAP to for communication between or own client, middleware and supplier web applications. I would like to replace this with EXI, while retaining human readable XML in other areas. In order to replace SOAP communication with EXI I either need to:

  1. Wait until EXI has been incorporated into existing SOAP stacks (Axis/SAAJ), or
  2. Replace my existing Axis/SAAJ SOAP client/supplier implementations with my own SOAP-ish protocol on top of EXI

The comparison between JSON and EXI is fair, but the use-cases for the two are different. There is no standard for meta-data for JSON, while there is XML-Schema for XML. With XML there are several standards bodies that define schemas for data exchange for specific industries. There are also a range of protocols/standards that are built on top of XML, such as SOAP, XML-Signature, XML-Encryption, WS-Security, SAML, etc. This does not exist for JSON.

Hence, XML is a better option for B2B message exchange and other cases where you need to integrate with external systems using industry standards. EXI can bring some of the benefits of JSON into this world, but it needs to be incorporated into existing XML APIs before widespread adoption can take place.

j0k
  • 22,600
  • 28
  • 79
  • 90
2

I'd personally rather not use EXI at all. It seems like it's taking all the clunky, bad things about XML, and cramming them into a binary format, which basically removes the saving grace of XML (plain text format).

It seems like the general trend of the industry is moving towards more lightweight data transfer models (HTTP REST for example), and moving away from heavy-weight models like SOAP. Personally, I'm not super excited about the idea of binary XML.

Anything that claims to be "the last binary standard" is probably wrong.

Andy White
  • 86,444
  • 48
  • 176
  • 211
  • 2
    Yeah, I also don't understand the point of EXI. The reason why XML is used even though it is bloated is because it is human readable. If you take that away then XML has nothing over any other standards. – Andrew Marsh Jun 25 '09 at 19:46
  • 3
    Not going to downvote, but also not going to agree. This is just a more efficient way to EXCHANGE the XML, allowing all of the flexibility of the current format without the over-the-wire bloat. – Steven Sudit Mar 12 '11 at 08:28
  • 5
    Actually, EXI is just an alternative way of representing XML data, where the plain text XML is the old version. It'd be easy to create a bit of code that converts an XML document transferred through EXI back to a plaintext XML document, considering the exact same data is contained therein. As I see it, EXI removes the two major downsides of XML - size and processing speed - leaving only the good parts. – cthulhu Mar 13 '11 at 15:58
  • I am +0 for EXI, but what is frustrating are all highly-hyped claims of performance, without actual external objective verification. Few tests that I have seen have tended use sub-optimal libs or APIs for textual XML; or just lack enough detail to know what the heck they try to test. – StaxMan Jun 30 '11 at 23:33