12

I am exporting large databases into xml format. This XML data needs to be compressed into the smallest possible format. I have heard alot about Efficient XML (EXI) and was wondering if there was a .NET implementation so that it can be called from within code...

Does anyone have an example of this as online resources seem to be a bit sparse...

user559142
  • 12,279
  • 49
  • 116
  • 179
  • 6
    It seems you control both the code that compresses the XML and the code that decompresses it later. Wouldn't it be better if you “compressed” it by storing it in some non-XML format and then “decompressed” it by converting to XML? – svick Feb 24 '12 at 17:23
  • 1
    I'd go for something like JSON to save on those bytes. I doubt encoding the information will help much? – Patrick Magee Dec 16 '13 at 16:34
  • 1
    @PatrickMagee JSON will save only in tags quotes, and end tags, apart from not being part of XML standard. This is far from any binary format, my answer have more info. – Diego C Nascimento Dec 23 '13 at 12:51

5 Answers5

4

Turns out Microsoft created their own binary XML format/encoding called MC-NBFX (catchy eh). This is part of the .NET framework and WCF as of .NET 3.0. For more info see:

Another option is to run a Java implementation through IKVM to produce a .Net assembly. Open source Java implementations I could find are:

redcalx
  • 8,177
  • 4
  • 56
  • 105
  • I will give a +1 for the informative answer. But it likes says that adhering to MC-NBFX is the only XML binary available what is not. You know, but the answer is a bit unclear about that. – Diego C Nascimento Dec 23 '13 at 12:48
  • That blog entry "WSF Binary XML and dictionaries" has got some really useful information I didn't know of. Thanks for sharing that! – Grimace of Despair Dec 24 '13 at 04:32
3

Nagasena has both .Net (written in C#) and Java implementations of the EXI specification.

takuki
  • 124
  • 1
  • 5
  • But it's so cryptic! And lacks C# doc/tuto. I've been able to (almost blindly) find my way to encode XML to EXI, but I gave up on getting XML back after the assembly threw me an unhandled null reference exception for no apparent reason. – Jerther Nov 21 '16 at 15:28
  • That doesn't sound good. Well, anyway, here is the [direct link](https://sourceforge.net/projects/openexi/) to the actual open source project. – James Oct 26 '17 at 01:45
3

Such implementation does exist. The company that created a predecessor of the Efficient XML Interchange format (AgileDelta) offers an Efficient XML library, which includes .Net version. Although they don't seem to publish the price.

The official EXI site doesn't list any other .Net implementation.

svick
  • 236,525
  • 50
  • 385
  • 514
0

Is there a reason you want the smallest possible format? XML is not really designed for compression optimization. @Svick's answer is the defacto for now if what you want are readily accessible archives.

You can find a lot of what you are asking here: Best compression algorithm for XML?

EXI is great if what you want is archived data that will be regularly accessed. Otherwise, if your goal is archiving for the long haul, just use a zip utility. KISS.

Community
  • 1
  • 1
VoteCoffee
  • 4,692
  • 1
  • 41
  • 44
  • it says it wants efficiency. Think of the text XML. It will need to convert let's say a integer to ASCII decimal representation, what consumes resources, before that you will run a zip compressor over the entire file again reducing efficiency. Binary XML can be efficient and there's some implementations of it. There's more effective formats, but it's asking for XML, so this is the way. – Diego C Nascimento Dec 23 '13 at 12:56
  • My main issue with binary formats is that they may not be supported in the long run. It seems he wants to archive data but still keep it accessible. How he should archive it depends on how often he expects to access it. I don't disagree that binary XML offers some efficiency advantages (it was covered well in other comments). I'm more concerned about why he wants to do what he is doing. I believe that zipping database exports as XML offers a long term storage solution that will not be prone to changes in standards a decade or so down the road, which may be worth the added inefficiency. – VoteCoffee Dec 30 '13 at 16:28
  • binary XML being efficient either in processing power and size is not opinion based is fact. Using source, _in terms of size it can reduce the size by 80%_ (http://en.wikipedia.org/wiki/Fast_Infoset). Anyway I know you mean it wants long term exchange, but there is standards that define some binary XML, the use of one or other thinking they will get committed then is opinion based to some point one is. – Diego C Nascimento Jan 06 '14 at 17:36
0

Binary XML is the way to go (and there's some implementation of it) if you need to glue to the XML standard.

JSON, even not being XML, will lost with numbers. Ex 32 bits unsigned int maximum value will be respresented by 10 bytes in JSON. In nearly all binary formats this will be 4 bytes. This then will apply to date/time and so on.

Any decent binary XML that have standard element/attribute types should give a much better size and processing efficiency. If it can reuse tags, like a dictionary in a compressed file, as you are exporting from a table, will be a nice feature too.

Diego C Nascimento
  • 2,801
  • 1
  • 17
  • 23