0

I want to demonstrate XSL powerfullness for data exploration by solving the following problem: Given an xml file that describes some kind of "entity-relashionship" model, and for one entity in that model given by a name (assuming an attribute of the XML schema is used as identifier), I want a transformation that produce a new XML model that contains the given entity, plus all of its relatives as per the "Transitive closure of the dependencies relationship" of that given entity.

For example, the input XML model is

<root>
    <!-- my model is made of 3 entities : leaf, composite and object -->
    <!-- the xml elements are <leaves>, <composites> and <objects> are just placeholders for these entities -->
    <!-- These placeholders are exepected to be in that order in the output as well as in the input (Schema constraints) -->
    <leaves>
        <!-- A, B, C are 3 types of different leaf nodes with their proper semantic in the model -->
        <A name="f1" others="oooo"/>
        <A name="f2" others="xxxx"/>
        <B name="f3" others="ssss"/>
        <C name="f4" others="gggg"/>    
    </leaves>
    <composites>
        <!-- composites containes only struct and union element -->
        <struct name="structB" others="yyyy">
            <!-- composite pattern, struct can embed struct in a tree-ish fashion -->
            <sRef name="s6" nameRef="structA"/>
            <!-- order of declaration does not matter !!! here in the XML, structA is not yet declared but file is valid -->
            <uRef name="u7" nameRef="unionX"/>
        </struct>
        <!-- union is another kind of composition -->
        <union name="unionX" others="rrrr">
            <vRef name="u3" nameRef="f3" others="jjjj">
            <vRef name="u4" nameRef="f2" others="pppp">
        </union>
        <struct name="structA" others="hhhh">
            <vRef name="v1" nameRef="f1" others="jjjj">
            <vRef name="v2" nameRef="f4" others="pppp">
        </struct>
    </composites>
    <objects>
        <object name="objB" others="tttt">
            <field name="field1" nameRef="unionX" others="qqqq"/>
            <field name="field2" nameRef="f2" others="cccc"/>
        </object>
        <object name="objC" others="nnnn">
            <field name="fieldX" nameRef="structB" others="uuuu"/>
            <field name="fieldY" nameRef="" others="mmmm"/>
        </object>
        <object name="objMain" others="nnnn">
            <field name="fieldY" nameRef="structA" others="mmmm"/>
            <field name="fieldY" nameRef="f3" others="mmmm"/>
            <field name="object4" nameRef="objB" others="wwwww"/>
        </object>
    </objects>
<root>

I would like a transformation that,for a given name, creates a copy of the model with only information related to the element of this name, and of its dependencies described by the nameRef attributes.

so for the element "field1" the output would be

<root>
    <leaves>
        <A name="f1" others="oooo"/>
    </leaves>
    <!-- composites and objects placeholders shall be copied even when no elements in the graph traversal -->
    <composites/>
    <objects/>
<root>

whereas for "objB" the exepected output would be

<root>
    <leaves>
        <!-- element "f2" shall be copied only once in the output, althought the node is encountered twice in the traversal of "objB" tree :
            - "f2" is referenced under "field2" of "obj2"
            - "f2" is referenced under "u4" of "unionX" that is referencd under "field1" of "obj2"      
        -->
        <A name="f2" others="xxxx"/>
        <B name="f3" others="ssss"/>
    </leaves>
    <composites>
        <union name="unionX" others="rrrr">
            <vRef name="u3" nameRef="f3" others="jjjj">
            <vRef name="u4" nameRef="f2" others="pppp">
        </union>
    <composites>
    <objects>
        <object name="objB" others="tttt">
            <field name="field1" nameRef="unionX" others="qqqq"/>
            <field name="field2" nameRef="f2" others="cccc"/>
        </object>
    </objects>
<root>

and so on an so forth.

From now on, I workout on a basic XSL but not very satisfying for the following reasons :

  • my transformation is not based on a "identity rules" base for copying
  • my transformation use an xsl:copy-of when encountering matching entity, but this breaks the design and violates the XSD Schema
  • the output file is not compliant with the XML Schema Definition of the input, mostly becauseof the xsl:copy-of that violates the traversal of the XML elements
  • my transformation makes duplicate entities in the output when one appears several times in the transitive closure of the dependency relationship

I have only some feelings and "intuitions" about the good and elegant way to do it.

  • starting from an "identity transformation" template to respect the Xml Schema of the input
  • using grouping / sorting by key
  • implements some kind of "Muenchian Method" for it (not sure about it in fact, maybe just for XSLT 1.0)

For simplification you can make the following assumptions:

  • their are no situation of cycling dependencies (tree walk can be implemented)
  • nameRef / name are cross checked by a "key" in the XSD so that references are correct in the input
  • the input parameter "name" of the element to search for exists in the input xml model (although it would be nice to produce an "empty" valid xml in that case)

the "empty" xml output model should be as follows (due to schema constraints)

<root>
    <leaves/>
    <composites/>
    <objects/>
<root> 

To complete : the xslt processor I am currently using is Saxon XSLT proc with and the version of XSLT is 2.0 Thanks for helping ... I don't give you the xsl that I am not proud of, but if it appears helpfull, I will ...

Mystic Tm
  • 17
  • 6
  • That is all rather vague, you say you want to have a result that is a valid instance of a certain schema but you haven't shown the schema. And it would help if you tell us which XSLT version you can use or want to use, it is not clear whether you mention "Muenchian" grouping because you are restricted to XSLT 1. And if you have recursive code that ends up producing duplicates then one step of the solution might be a second transformation (step) that eliminates those duplicates. Such an elimination is certainly treated in any text book on XSLT and in lots of examples on StackOverflow. – Martin Honnen Dec 07 '19 at 13:41
  • Hi, thanks for giving your opinion on that question. My thought was that the schema does not relay matter since the transformation I would like is about filtering out elements from a valid xml file as input. But I can provide it if required. – Mystic Tm Dec 07 '19 at 19:15
  • For the other points : I am not constraint to XSLT 1.0, in fact i am using the Saxon XSLT processor and a XSLT 2.0 template. Finally, the transformation I created has drawbacks of duplicates but also the drawback of creating an xml document that is not conform to the schema of the input – Mystic Tm Dec 07 '19 at 19:18
  • Thanks for pointing out the fact that a possible solution would be using a pipelinefull of transformations instead of a single step transformation. Creating the "killer" XSL template is for me a good challenge I would like to deal with ... – Mystic Tm Dec 07 '19 at 19:24
  • Maybe the fact that I am using Saxon will trigger one more brilliant answer of Mister Michael Kay ;) – Mystic Tm Dec 07 '19 at 19:40
  • When you search for `element "field1"`, why does the result not contain the `` and its ancestors? – Martin Honnen Dec 07 '19 at 20:41
  • This is just a variation of **topological sorting**. For a complete **XPath 3.0** , XSLT 1.0 and later versions solutions see my recent answer: https://stackoverflow.com/a/58174330/36305 and the information pointed to by the links in this answer. Please, do let me know if you'd be interested for me to provide a new, separate XPath 3 or XSLT answer – Dimitre Novatchev Dec 08 '19 at 20:08

1 Answers1

0

I tried to implement "a transformation that,for a given name, creates a copy of the model with only information related to the element of this name, and of its dependencies described by the nameRef attributes" at https://xsltfiddle.liberty-development.net/gWEamLs/6:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    xmlns:mf="http://example.com/mf"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:param name="start-name" as="xs:string">objB</xsl:param>

  <xsl:key name="name-ref" match="*[@name]" use="@name"/>

  <xsl:function name="mf:traverse" as="element()*">
      <xsl:param name="start" as="element()?"/>
      <xsl:sequence select="$start, $start/*, $start/*[@nameRef]!key('name-ref', @nameRef, root(.))!mf:traverse(.)"/>
  </xsl:function>

  <xsl:param name="start-element" as="element()?" select="key('name-ref', $start-name)"/>

  <xsl:variable name="named-elements" select="mf:traverse($start-element)"/>

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="*[@name and not(. intersect $named-elements)]"/>

</xsl:stylesheet>

Based on a key and a recursive function the code "first" computes the related elements as a sequence of element nodes in a global variable and "then" the identity transformation set up declaratively by <xsl:mode on-no-match="shallow-copy"/> just gets extended by an empty template for those elements having a name attribute but not having been found by the recursive function as being related to the start element, ensuring any not related elements that way don't get copied to the output.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Hi Martin ! wow your template looks to be on the right path, although it seems that the children of any elements to be copied are missing. I am not good at XSLT 3.0 and I do not understand well the but maybe it is related to this rule ? – Mystic Tm Dec 07 '19 at 21:01
  • maybe the template is missing the "identity transformation" so that only the root of matching element is being copied, but not the children of those matching ? – Mystic Tm Dec 07 '19 at 21:07
  • one more thing i do not understand is the usage of function key() with a third arguments as in following !key('name-ref', @nameRef, root(.)) Can you explain the way it works ? – Mystic Tm Dec 07 '19 at 21:13
  • `` basically is a declarative XSLT 3 way of setting up the identity transformation as the default template. – Martin Honnen Dec 07 '19 at 21:13
  • Inside of a function there is no context document so you need to ensure you use the three argument version of `key` inside of any function to have it work. – Martin Honnen Dec 07 '19 at 21:14
  • Hi Martin, did you fix it ? your last attempt seems to solve the"missing children" issu, right ? – Mystic Tm Dec 08 '19 at 12:27
  • I am not sure to fully undertand how it works, but it seems to be working ... can you provide some explanations about this xpath : $start/*[@nameRef]!key('name-ref', @nameRef, root(.))!mf:traverse(.) please, just for my understanding... in particular I am not familiar with the !key(...)!mf:traverse(...) – Mystic Tm Dec 08 '19 at 12:29
  • The simple map operator `!` in XPath 3 (https://www.w3.org/TR/xpath-31/#id-map-operator) is used there twice, the `$start/*[@nameRef]!key('name-ref', @nameRef, root(.))` applies the XSLT `key` function to each `$start` child element `*` having a `nameRef`, simply to follow the relationships in your structure, then the second map operator use recursively calls the `mf:traverse` function on each element found by the `key` function call. – Martin Honnen Dec 08 '19 at 13:09
  • As for the fix, it was mainly already in the second edit where I adapted the sequence in the function body to include the children `$start/*`. I also needed to fix the type annotation of the function of the `mf:traverse` function parameter to rightly handle the case when no existing start element name is passed in to the stylesheet. – Martin Honnen Dec 08 '19 at 13:12
  • Hi there, I just noticed a weird thing not expected : the given solution is not completely working depending on which element is used to start the traversal : with the exemple given in the fiddle, the transformation does not work if start-element is set to "u3" nor "s6" even if elements exist in the input file with those name. It appears that the transformation fails when the start-element is a child of another element with @name attribute ... I don't want that behavior in the final solution .. – Mystic Tm Mar 23 '20 at 17:15
  • @MysticTm, better ask a new question with the necessary details (minimal but complete samples of input, params, output you want, output you get), I don't remember the original problem but the solution doing `` is obviously blocking any processing of elements not previously computed as part of the `$named-elements` sequence. – Martin Honnen Mar 23 '20 at 17:44