1

I have the following XML:

<?xml version="1.0" encoding="UTF-8"?>
<ROOT>
        <p width="17.76" x="81.6" y="270.708">
            <span x="81.6" y="270.708" base="273.9" width="17.76" height="4.368">Copy</span>
        </p>
        <p width="22.32" x="101.52" y="270.708">
            <span x="101.52" y="270.708" base="273.9" width="22.32" height="4.368">mailed</span>
        </p>
        <p width="6.24" x="126" y="270.708">
            <span x="126" y="270.708" base="273.9" width="6.24" height="4.368">to</span>
        </p>
        <p width="15.12" x="134.4" y="270.708">
            <span x="134.4" y="270.708" base="273.9" width="15.12" height="4.368">third</span>
        </p>
        <p width="23.04" x="151.68" y="270.708">
            <span x="151.68" y="270.708" base="273.9" width="23.04" height="4.368">parties</span>
        </p>
        <p width="2.64" x="176.88" y="270.708">
            <span x="176.88" y="270.708" base="273.9" width="2.64" height="4.368">-</span>
        </p>
        <p width="12.24" x="181.68" y="270.708">
            <span x="181.68" y="270.708" base="273.9" width="12.24" height="4.368">see</span>
        </p>
        <p width="16.8" x="196.08" y="270.708">
            <span x="196.08" y="270.708" base="273.9" width="16.8" height="4.368">page</span>
        </p>
        <p width="8.64" x="215.04" y="270.708">
            <span x="215.04" y="270.708" base="273.9" width="8.64" height="4.368">33</span>
        </p>
    </ROOT>

and I am trying to merge the paragraph sibling nodes when it meets the following two conditions:

  1. if the sibling has the same y attribute value and,
  2. if the following siblings x attribute value - (sum of current width and current x attribute value) < 4

I have code that applies the above first condition but I can't correctly apply the second condition. I am using recursion method, I am guessing this involves complex grouping.

After applying the above conditions the output should look like:

 <?xml version="1.0" encoding="UTF-8"?>
 <ROOT>
        <p width="17.76" x="81.6" y="270.708">
            <span x="81.6" y="270.708" base="273.9" width="17.76" height="4.368">Copy</span>
            <span x="101.52" y="270.708" base="273.9" width="22.32" height="4.368">mailed</span>        
            <span x="126" y="270.708" base="273.9" width="6.24" height="4.368">to</span> 
            <span x="134.4" y="270.708" base="273.9" width="15.12" height="4.368">third</span>         
            <span x="151.68" y="270.708" base="273.9" width="23.04" height="4.368">parties</span>            
            <span x="176.88" y="270.708" base="273.9" width="2.64" height="4.368">-</span>            
            <span x="181.68" y="270.708" base="273.9" width="12.24" height="4.368">see</span>
            <span x="196.08" y="270.708" base="273.9" width="16.8" height="4.368">page</span>
            <span x="215.04" y="270.708" base="273.9" width="8.64" height="4.368">33</span>
        </p>
    </ROOT>

This is the code I have right now is:

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="2.0">

    <xsl:template match="node() | @*">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="ROOT">
        <xsl:copy>
            <xsl:copy-of select="@*" />
            <xsl:apply-templates select="p[not(preceding-sibling::p/@y = @y)]" mode="sibling-join" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="p" mode="sibling-join">
        <xsl:copy>
            <xsl:apply-templates select="node() | @*" />
            <xsl:apply-templates select="following-sibling::p[current()/@y = @y]" />
        </xsl:copy>
    </xsl:template>

    <xsl:template match="p">
        <xsl:apply-templates select="node()" />
    </xsl:template>
</xsl:stylesheet>
sim271994
  • 13
  • 3

1 Answers1

0

As some XSLT 2 processors like Saxon 9 or 10 or Altova Raptor or XmlPrime also support XQuery 3 it might be more convenient to use the XQuery 3 window clause to formulate such conditions as it explicitly allows you to form end conditions where you have access to variable binding comparing the "last" item in a "window/group" to the following item (next):

<ROOT>
{
    for tumbling window $w in ROOT/p
    start when true()
    end $e next $n when $e/@y != $n/@y or $n/@x - $e!(@width + @x) ge 4
    return
        <p>
        {
            head($w)/@*, $w/node()
        }
        </p>
}    
</ROOT>

https://xqueryfiddle.liberty-development.net/3Nzd8bU

In XSLT 2/3 using for-each-group group-starting-with and assuming the grouping population is a sibling sequence you could use

<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    xmlns:xs="http://www.w3.org/2001/XMLSchema"
    exclude-result-prefixes="#all"
    version="3.0">

  <xsl:mode on-no-match="shallow-copy"/>

  <xsl:template match="ROOT">
      <xsl:copy>
          <xsl:for-each-group select="p" 
              group-starting-with="p[1] | p[preceding-sibling::p[1][@y != current()/@y or (@x - current()!(@width + @x) ge 4)]]">
              <xsl:copy>
                  <xsl:apply-templates select="@*, current-group()/node()"/>
              </xsl:copy>
          </xsl:for-each-group>
      </xsl:copy>
  </xsl:template>
  
</xsl:stylesheet>

https://xsltfiddle.liberty-development.net/naZYrpE/7

The used expression current()!(@with + @x) is XSLT/XPath 3 but you can use current()/(@with + @x) in XSLT 2. Instead of using the declarative xsl:mode to set up the identity transformation you would need to spell out the template <xsl:template match="@* | node()"><xsl:copy><xsl:apply-templates select="@* | node()"/></xsl:copy></xsl:template>.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Hello @Martin, I have updated the XML at the link for which the second case does not apply. It is merging all the nodes where it should not. [link](https://xsltfiddle.liberty-development.net/naZYrpE/6),also how can I do this in xslt version 2.0 ? – sim271994 Jun 25 '20 at 17:04
  • @sim271994, see the edit, I think I had the wrong condition with `@width` instead of `@x` in the initial XSLT example. – Martin Honnen Jun 25 '20 at 18:28