3

I'm normally using the XSLT support in the JDK (JDK 7) for XSLT transformations. Recently I've come across a rather large XML document, and applying XSLT transformations to this (even very basic ones) causes a lot of memory to be used.

I've been careful to do all my processing streaming, but it seems the XSLT engine in the JDK (which appears to be a modified Xalan) always builds up a DOM in memory first. Obviously this is not what I want.

Now I found out that the Xalan which is available separately (2.7.1 from 2007!) does have an API for doing incremental transformations. So while this does seem to work, I actually want my code to run on a stock JDK, without telling the user to fiddle with any endorsed folder.

What is the best way to do incremental XSLT transformations in Java so that my code is compatible with unmodified/stock JDK installations?

update: This recently updated question is strongly related: What is the Most Efficient Java-Based streaming XSLT Processor?

Community
  • 1
  • 1
dexter meyers
  • 2,798
  • 2
  • 18
  • 22
  • Try this link - http://www.devx.com/xml/Article/34677/1954 - see the section titled "Iteration 2: Divide-and-Conquer JAXP XSLT Transformation" – kjp Jun 04 '12 at 12:18

2 Answers2

2

Have you tried the saxon:stream() extension in Saxon?

Burkart
  • 462
  • 4
  • 9
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • Thanks for the reply! It seems this feature is only available in Saxon-EE, which is something like 8000 pounds. Unfortunately there is no way I can get our manager to accept such an amount. – dexter meyers Jun 04 '12 at 13:16
  • @dextermeyers: I am surprised Saxon costs 8000 pounds. Last time I checked it was in the range 200 to 300 pounds. If your company can wait, in the near future there would be a standard XSLT 3.0 streaming feature and possible implementations by more than one vendor. – Dimitre Novatchev Jun 04 '12 at 13:20
  • I've been watching XSLT 3.0 for a couple of years, but like many XML initiatives it seems to have died out at around 2007/2008. Is there still work being done for it by someone? – dexter meyers Jun 04 '12 at 13:56
  • 1
    @dextermeyers: I don't think XSLT 3.0 had started yet in 2007. Yes, active work is going on XSLT 3.0 at the W3C XSLT WG. The current Draft (still named 2.1) is at: http://www.w3.org/TR/xslt-21/ . There might be a newer draft published in the next couple of months (or even this month). – Dimitre Novatchev Jun 04 '12 at 14:26
  • The price of a single Saxon-EE license is £300 (or less if you don't want XQuery). The quoted figure of £8000 is for a site license. – Michael Kay Jun 04 '12 at 17:08
  • 2
    Work on streaming in XSLT 3.0 is ongoing. There hasn't been a public working draft for a while, but that doesn't mean work has stopped. Progress would be faster, however, if there were more implementors involved. – Michael Kay Jun 04 '12 at 17:10
  • >The quoted figure of £8000 is for a site license - Indeed, but if you have a team of say 10 developers and have 4 staging servers, wouldn't you need at least 14 licenses? That's still £4200. – dexter meyers Jun 06 '12 at 13:40
  • @dextermeyers: And if you organize a team lunch for these 14 developers a few times you'll need not less money -- your company has to decide which is more valuable. – Dimitre Novatchev Jun 06 '12 at 14:28
  • True, but please tell that to my manager :( I'm just the poor fellow who has to make it work. It's o/t here, but a brand new computer costs no more than 2000 and will make me more productive, saving money in the end. But that too is too expensive. – dexter meyers Jun 06 '12 at 15:03
  • @dextermeyers: So, do you think XSLT streaming will not save money? Think about saved man-hours and multiply by an hour's pay. Think about savings from not buying computers with huge memory. And just the fact that something that was not possiible (or was extremely resource-consuming) before would be possible/straight-forward-easy now. – Dimitre Novatchev Jun 06 '12 at 15:59
  • You're preaching in your own Church. As an engineer I fully agree with you. Unfortunately, most managers are bean counters, penny wise pound foolish... – dexter meyers Jun 07 '12 at 13:48
  • @dextermeyers: If your statement about "most managers" is true, this is an organizational/social problem -- not XSLT problem. So, this is probably more relevant to other SO forums. – Dimitre Novatchev Jun 07 '12 at 14:29
2

Firstly, I would strongly recommend using the Apache versions of Xalan and Xerces rather than the versions bundled in the JDK, which are very buggy. That's particularly true for Xerces.

Secondly, if you're using Java then you really ought to be moving to XSLT 2.0, which gives you vast improvements in development productivity. In practice that means Saxon (the home edition of Saxon is free).

Incremental transformation in Xalan doesn't actually stop it building the whole source document as a tree in memory; all it does is to allow the tree to be built in parallel with the transformation process. If you want a streaming transformation, Saxon-EE is your only practical option. (Note that the saxon:stream() extension is only one small part of the streaming capabilities that Saxon offers).

Michael Kay
  • 156,231
  • 11
  • 92
  • 164