0

After having transformed my large xml files into a series of paginated html fragments, for crossreferences I am now working on an xsl function that needs to know which file a certain node (resp. the element it has been transformed to) has ended up in.

The files are named like this: 001_div1.html, 002_div2.html etc. Suppose I know that I want the _div25.html, but I don't know the number prefix. As I understand it, xpath's collection() function would help me out, but it doesn't.

I assume this is due to the eXist-launched saxon not being aware that we're dealing with nodes in an xml database, not files in the filesystem. But then again, with doc('../../html/003_div3.html') it does work as it does with document-available('../../html/003_div3.html'), so these functions are somehow fed with nodes from the db...

What I would like to do is this:

<xsl:for-each select="collection('../../html/*_div25.html')">
    <xsl:value-of select="tokenize(replace(document-uri(.), '.html$', ''),'/')[last()]"/>
</xsl:for-each>

But this gives me:

Exception while transforming node: Exception thrown by URIResolver

Here is the hack I am presently using:

<xsl:for-each select="1 to $maxNumberOfHtmlFragments">    <!-- For all those numbers, check if there is a filename 
                                                               starting with the number, followed by the known NodeId,
                                                               and ending with .html. -->
    <xsl:variable name="filename" select="concat('../../html/', xs:string(format-number(position(), '000')), '_div25.html')"/>
    <xsl:if test="doc-available($filename)">
        <xsl:value-of select="tokenize(replace($filename, '.html$', ''),'/')[last()]"/>
    </xsl:if>
</xsl:for-each>

But this has quite some performance impact! Note that using the same paths (without wildcards of course) in document() and doc-available() does work fine.

Is the eXist-saxon connection lacking with regard to the collection() function?

Are there better ways of achieving what I want anyway?

awagner
  • 107
  • 1
  • 8
  • The `collection` would give you a sequence of document nodes and your `replace` call would then operate on the string value of each node while your second sample does something very different, it constructs a URI as a string `$filename`, checks whether `doc-available($filename)` and then use `replace` on that string. So I would the first sample expect to try something like ``. – Martin Honnen Oct 09 '14 at 11:22
  • Again, there's the `Exception thrown by URIResolver` error also with `document-uri(.)` where you suggested I use it. I have the impression that during all the (chaotic) testing that I did, I have _never_ had a `collection()` call that did not trigger this error. Given that `collection()` seems to be a "non-standardized standard function", can you confirm that it should work in XSLT files that are applied by xquery transform:transform? – awagner Oct 09 '14 at 13:41
  • No, sorry, I am not even an exist-db user, I was simply trying to point out inconsistencies between the two code samples. I am sure someone else can tell you more about exist-db specific problems. – Martin Honnen Oct 09 '14 at 14:26
  • the problem is that the `collection()` function is the one from saxon, not the one from eXist-db ; By no means the saxon `collection()` function will be able to return a sequence of eXist-db nodes. For the `doc()` function it will be a bit more simple, for this it is possible to serialize an eXist-db document to a bytestream which can be pickedup by saxon. – DiZzZz Nov 07 '14 at 10:13

1 Answers1

0

You cannot use Collection to access a specific document.

In your example you have:

collection('../../html/*_div25.html')

in eXist Collections are like folders in a filesystem, so you access 0..N documents. You might be able to access a collection from the database using something like:

collection('../../html')

you could then use document-uri() in a predicate to filter the documents. e.g.

collection('../../html')[fn:ends-with(fn:document-uri(.), "_div24.html")]
adamretter
  • 3,885
  • 2
  • 23
  • 43
  • Hence the "might be able to", I guess you are being bitten by this - https://github.com/eXist-db/exist/issues/351 – adamretter Oct 09 '14 at 14:51
  • Yes, probably. So I will continue with my hackish workaround for now. Only I wonder that `doc(...)`, `doc-available(...)` etc do work fine (i.e. they access "files" from eXist's "../../html" collection). Doesn't that mean that eXist has a resolver for that in place already? – awagner Oct 09 '14 at 21:39
  • As far as I am aware eXist does not set a resolver for Saxon. Saxon seems to read those files using relative paths, relative to where the stylesheet was loaded from. – adamretter Oct 10 '14 at 11:07