1

The code below is an attempt to simplify the setup required to perform EXI compression and decompression using EXIficient

class ExiCompressionUtils {
    static Transformer transformer = TransformerFactory.newInstance().newTransformer()

    static byte[] compress(String xml) {
        ByteArrayOutputStream exiOS = new ByteArrayOutputStream()
        EXIResult exiResult = new EXIResult(outputStream : exiOS)

        XMLReader xmlReader = XMLReaderFactory.createXMLReader()
        xmlReader.contentHandler = exiResult.handler
        xmlReader.parse(new InputSource(new StringReader(xml)))

        def compressed = exiOS.toByteArray()
        exiOS.close()
        return compressed
    }

    static String extract(byte[] compressed) {
        SAXSource exiSource = new SAXSource(new InputSource(new ByteArrayInputStream(compressed)))
        exiSource.setXMLReader(exiSource.reader)

        ByteArrayOutputStream exiOS = new ByteArrayOutputStream()
        transformer.transform(exiSource, new StreamResult(exiOS))  // fails here
        def extracted = exiOS.toString()
        exiOS.close()
        return compressed
    }
}

The below test fails with ERROR: 'Invalid byte 1 of 1-byte UTF-8 sequence.'

@Test
void testExiCompression() {
    def xml = '<Root><Child id="1">Text</Child><EmptyTag/></Root>'
    def compressed = ExiCompressionUtils.compress(xml)
    assert ExiCompressionUtils.extract(compressed) == xml
} 

Any encoding experts out there that can get to the bottom of this?

Sled
  • 18,541
  • 27
  • 119
  • 168
Jonathan Schneider
  • 26,852
  • 13
  • 75
  • 99
  • I can not understand: it is a real Java? Why semicolons are omitted? What is the construction `def compressed...`? – Andremoniy Jan 17 '13 at 21:32
  • It's Groovy... close enough, compiles to .class files, essentially equivalent here. I labelled it Java because it could just as easily be plain Java rather than the couple of shortcuts I took with Groovy. They are interchangeable for most purposes these days anyway. – Jonathan Schneider Jan 17 '13 at 23:19
  • @jkschneider Just a thought, but is the test case file (that has the XML) encoded in *UTF-8*? I don't know much about exi but I've seen that error before and it usually has to do with the encoding of the xml not conforming to *UTF-8*. – BPaasch Apr 04 '13 at 20:07
  • I'm wondering if there's a bug in EXIficient itself. I've looked at this quite a bit, and it looks like you follow their sample and unit tests identically (except for using `ByteArrayOutputStream` instead of `FileOutputStream` -- which should not affect the encoding). – Keegan Jul 12 '15 at 02:58

1 Answers1

1

Today I struggled over this comment. There is one important issue with this code (besides the strange syntax for Java missing semicolons etc.)

When reading use EXISource and not SAXSource!

Attached the piece of code that works.

-- Daniel

static Transformer transformer;

static {
    try {
        transformer = TransformerFactory.newInstance().newTransformer();
    } catch (TransformerConfigurationException e) {
    } catch (TransformerFactoryConfigurationError e) {
    }
}

static byte[] compress(String xml) throws IOException, EXIException,
        SAXException {
    ByteArrayOutputStream exiOS = new ByteArrayOutputStream();
    EXIResult exiResult = new EXIResult();
    exiResult.setOutputStream(exiOS);

    XMLReader xmlReader = XMLReaderFactory.createXMLReader();
    xmlReader.setContentHandler(exiResult.getHandler());
    xmlReader.parse(new InputSource(new StringReader(xml)));

    byte[] compressed = exiOS.toByteArray();
    exiOS.close();

    return compressed;
}

static String extract(byte[] compressed) throws TransformerException,
        IOException, EXIException {
    // SAXSource exiSource = new SAXSource(new InputSource(new
    // ByteArrayInputStream(compressed))); // use EXISource instead!
    SAXSource exiSource = new EXISource();
    exiSource.setInputSource(new InputSource(new ByteArrayInputStream(
            compressed)));

    ByteArrayOutputStream exiOS = new ByteArrayOutputStream();
    transformer.transform(exiSource, new StreamResult(exiOS));
    String extracted = exiOS.toString();
    exiOS.close();
    return extracted;
}

public static void main(String[] args) throws IOException, EXIException,
        SAXException, TransformerException {
    String xml = "<Root><Child id=\"1\">Text</Child><EmptyTag/></Root>";
    byte[] compressed = ExiCompressionUtils.compress(xml);
    System.out.println(ExiCompressionUtils.extract(compressed));
}