1

Is it possible to create a "TransformerOutputStream", which extends the standard java.io.OutputStream, wraps a provided output stream and applies an XSL transformation? I can't find any combination of APIs which allows me to do this.

The key point is that, once created, the TransformerOutputStream may be passed to other APIs which accept a standard java.io.OutputStream.

Minimal usage would be something like:

java.io.InputStream in = getXmlInput();
java.io.OutputStream out = getTargetOutput();

javax.xml.transform.Templates templates = createReusableTemplates();        // could also use S9API
TransformerOutputStream tos = new TransformerOutputStream(out, templates);  // extends OutputStream

com.google.common.io.ByteStreams.copy(in, tos);

// possibly flush/close tos if required by implementation

That's a JAXP example, but as I'm currently using Saxon an S9API solution would be fine too.

The main avenue I've persued is along the lines of:

  • a class which extends java.io.OutputStream and implements org.xml.sax.ContentHandler
  • an XSL transformer based on an org.xml.sax.ContentHandler

But I can't find implementations of either of these, which seems to suggest that either no one else has ever tried to do this, there is some problem which makes it impractical, or my search skills just are not that good.

I can understand that with some templates an XML transformer may require access to the entire document and so a SAX content handler may provide no advantage, but there must also be simple transformations which could be applied to the stream as it passed through? This kind of interface would leave that decision up to the transformer implementation.

I have a written and am currently using a class which provides this interface, but it just collects the output data in an internal buffer then uses a standard JAXP StreamSource to read that on flush or close, so ends up buffering the entire document.

Barney
  • 2,786
  • 2
  • 32
  • 34
  • It already exists. [`StreamResult`](https://docs.oracle.com/javase/7/docs/api/javax/xml/transform/stream/StreamResult.html). – user207421 Jun 14 '19 at 03:06
  • Possible duplicate of https://stackoverflow.com/q/1937684/207421. – user207421 Jun 14 '19 at 03:07
  • @user207421 [StreamResult](https://docs.oracle.com/javase/7/docs/api/javax/xml/transform/stream/StreamResult.html) does not extend output stream - I've updated the question to make this part clearer. – Barney Jun 14 '19 at 04:31
  • I am not sure how an OutputStream relates to the XSLT 2 or 3 processing model, as far as the JAXP "chaining" of two XSLTs based on SAX is concerned, you can create a SAXTransformerFactory https://docs.oracle.com/javase/8/docs/api/javax/xml/transform/sax/SAXTransformerFactory.html. Within s9api and Saxon 9.9 you have http://saxonica.com/html/documentation/javadoc/net/sf/saxon/s9api/Xslt30Transformer.html#asDocumentDestination-net.sf.saxon.s9api.Destination- to chain two XSLT 3 transformations, if these are streamable (in the sense of XSLT 3) you use EE then it should work using streaming. – Martin Honnen Jun 14 '19 at 06:10
  • For example code, see for instance the method `exampleXMLFilterChain` in https://dev.saxonica.com/repos/archive/opensource/latest9.9/samples/java/he/JAXPExamples.java or `TransformD` in https://dev.saxonica.com/repos/archive/opensource/latest9.9/samples/java/he/S9APIExamples.java. – Martin Honnen Jun 14 '19 at 06:27
  • I'm aware it doesn't extend `OutputStream`, thanks; neverthess it is the answer to your question. Slight adjustment to your model code required. Nothing inconceivable. – user207421 Jun 14 '19 at 09:42
  • 1
    @user207421 so what is the "adjustment" needed to take a StreamResult, convert it to an `OutputStream`, and pass it to a class that requires an `OutputStream`? ISTM that `OutputStream` is the actual point of the question, and StreamResult is actually the wrong end of the pipeline. – Doctor Eval Jun 14 '19 at 10:41

1 Answers1

2

You could make your TransformerOutputStream extend ByteArrayOutputStream, and its close() method could take the underlying byte[] array, wrap it in a ByteArrayInputStream, and invoke a transformation with the input taken from this InputStream.

But it seems you also want to avoid putting the entire contents of the stream in memory. So let's assume that the transformation you want to apply is an XSLT 3.0 streamable transformation. Unfortunately, although Saxon as a streaming XSLT transformer operates largely in push mode (by "push" I mean that the data supplier invokes the data consumer, whereas "pull" means that the data consumer invokes the data supplier), the first stage, of reading and parsing the input, is always in pull mode -- I don't know of an XML parser to which you can push lexical XML input.

This means there's a push-pull conflict here. There are two solutions to a push-pull conflict. One is to buffer the data in memory (which is the ByteArrayOutputStream approach mentioned earlier). The other is to use two threads, with one writing to a shared buffer and the other reading from it. This can be achieved using a PipedOutputStream in the writing thread (https://docs.oracle.com/javase/8/docs/api/index.html?java/io/PipedOutputStream.html) and a PipedInputStream in the reading thread.

Caveat: I haven't actually tried this, but I see no reason why it shouldn't work.

Note that the topic of streaming in XSLT 3.0 is fairly complex; you will need to learn about it before you can make much progress here. I would start with Abel Braaksma's talk from XML London 2014: https://xmllondon.com/2014/presentations/braaksma

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
  • It's certainly possible to start a thread and use pipes - by which I mean, the solution definitely works - but doing so is massively complex and expensive when you're already knee-deep inside futures and threadpools. I really was hoping that after having not looked for about 10 years, there would finally be an answer to this question. :-( – Doctor Eval Jun 14 '19 at 10:25
  • Well, I expect most people's excuse for not implementing this is precisely the same as yours. – Michael Kay Jun 14 '19 at 13:43
  • Yes, a push model is exactly what I was hoping for - the ByteArrayOutputStream solution is what I currently have as a placeholder. Thanks for clarifying - saves me digging any further. Great reference to xslt3 streaming too. – Barney Jun 15 '19 at 01:27