0

Below is the Input XML and I am looking for the desired output -

   <xml>
    <a>
        <element0>987</element0>
    </a>
    <a>
        <a_list_one>
            <a_lag_one>
                <element1>123</element1>
                <element2>456</element2>
            </a_lag_one>
        </a_list_one>
        <a_list_one>
            <a_lag_one>
                <element1>789</element1>
                <element2>678</element2>
            </a_lag_one>                
        </a_list_one>
        <a_list_two>
            <a_lag_two>
                <a_list_three>
                    <a_lag_three>
                        <element3>570</element3>
                        <element4>678</element4>
                    </a_lag_three>
                </a_list_three>
                <a_list_three>
                    <a_lag_three>
                        <element3>989</element3>
                        <element4>231</element4>
                    </a_lag_three>
                </a_list_three>
            </a_lag_two>
            <a_lag_two>
                <a_list_three>
                    <a_lag_three>
                        <element3>570</element3>
                        <element4>678</element4>
                    </a_lag_three>
                </a_list_three>
                <a_list_three>
                    <a_lag_three>
                        <element3>9873</element3>
                        <element4>278</element4>
                    </a_lag_three>
                </a_list_three>
                <a_list_four>
                    <a_lag_four>
                        <element5>9121</element5>
                        <element6>9879</element6>
                    </a_lag_four>
                </a_list_four>
                <a_list_three>
                    <a_lag_four>
                        <element5>098</element5>
                        <element6>231</element6>
                    </a_lag_four>
                </a_list_three>
            </a_lag_two>
        </a_list_two>
        <a_list_four>
                    <a_lag_four>
                        <element5>654</element5>
                        <element6>7665</element6>
                    </a_lag_four>
        </a_list_four>
    </a>
    <b>
        <b_list_one>
            <b_lag_one>
                <element8>123</element8>
                <element9>456</element9>
            </b_lag_one>
        </b_list_one>
    </b>
    <b>
        <b_list_one>
            <b_lag_one>
                <element8>789</element8>
                <element9>678</element9>
            </b_lag_one>            
        </b_list_one>
    </b>
</xml>

Desired XML is:

   <xml>
    <a>
        <element0>987</element0>
        <a_list_one>
            <a_lag_one>
                <element1>123</element1>
                <element2>456</element2>
            </a_lag_one>
            <a_lag_one>
                <element1>789</element1>
                <element2>678</element2>
            </a_lag_one>
        </a_list_one>
        <a_list_two>
            <a_lag_two>
                <a_list_three>
                    <a_lag_three>
                        <element3>570</element3>
                        <element4>678</element4>
                    </a_lag_three>
                    <a_lag_three>
                        <element3>989</element3>
                        <element4>231</element4>
                    </a_lag_three>
                </a_list_three>
            </a_lag_two>
            <a_lag_two>
                <a_list_three>
                    <a_lag_three>
                        <element3>570</element3>
                        <element4>678</element4>
                    </a_lag_three>
                    <a_lag_three>
                        <element3>9873</element3>
                        <element4>278</element4>
                    </a_lag_three>
                    <a_lag_four>
                        <element5>098</element5>
                        <element6>231</element6>
                    </a_lag_four>
                </a_list_three>
                <a_list_four>
                    <a_lag_four>
                        <element5>9121</element5>
                        <element6>9879</element6>
                    </a_lag_four>
                </a_list_four>
            </a_lag_two>
        </a_list_two>
        <a_list_four>
            <a_lag_four>
                <element5>654</element5>
                <element6>7665</element6>
            </a_lag_four>
        </a_list_four>      
    </a>
    <b>
        <b_list_one>
            <b_lag_one>
                <element8>123</element8>
                <element9>456</element9>
            </b_lag_one>
            <b_lag_one>
                <element8>789</element8>
                <element9>678</element9>
            </b_lag_one>            
        </b_list_one>
    </b>
</xml>

I am looking for XSL which does the conversion to the desired output. Here, the nodes which share the same name and also contains "_LIST" should be merged together. However, this logic should happen only within the first "_LIST" node and should not apply to inner nodes. Secondly, at the root level also, the nodes to be merged. For example here, there should be only one "a" tag and "b" tag. Kindly help.

Tim C
  • 70,053
  • 14
  • 74
  • 93
  • Please post your attempted XSLT. Thanks. – John Ernst Oct 11 '18 at 16:08
  • You said, "this logic should happen only within the first "_LIST" node and should not apply to inner nodes". However, you are combing a_list_three in the desired results. It also looks like you have other inconsistencies in your desired results. In the input, element2 is not in a_lag_one, but it is in a_lag_one in the desired results. You may want to clean this up. – John Ernst Oct 11 '18 at 18:42
  • Hello Bluewood56, thanks for asking. The desired xml is right. What I meant was, the a_list_four tag which is appearing inside a_lag_two which is inturn appearing in a_list_three should not be merged with the a_list_four tag appearing outside a_list_three as both are different even-though sharing the same name as they do not belong to the same list - a_list_three. And that was a typo about the element2. I corrected that. That belongs to the a_lag_one as well. – Varun Vemuganti Oct 12 '18 at 06:49

2 Answers2

0

I think in XQuery 3 you can solve this using two nested for .. group by expressions:

/*/element { node-name(.) } {
    for $child-element at $pos in *
    group by $element-name := node-name($child-element)
    order by $pos[1]
    return
        element { $element-name } {
            for $grand-child at $pos in $child-element/*
            let $grand-child-name := node-name($grand-child)
            group by $key := $grand-child-name, $handle := contains(string($grand-child-name), '_list')
            order by $pos[1]
            return
                if ($handle)
                then
                    element { $key } {
                        $grand-child/*
                    }
                else $grand-child
        }
}

https://xqueryfiddle.liberty-development.net/pPgCcor

For XSLT 1 I would keys like the already suggested solution but I think it is then easier to use two different match patterns for each key, one for the first item in a group established by the key that makes a copy and processes the child nodes of the group, and the second being empty to suppress processing the duplicated element names of a group:

<xsl:stylesheet
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
    version="1.0">

  <xsl:output indent="yes"/>
  <xsl:strip-space elements="*"/>

  <xsl:key name="child-group" match="/*/*" use="name()"/>
  <xsl:key name="grand-child-group" match="/*/*/*[contains(local-name(), '_list')]" use="name()"/>

  <xsl:template match="@* | node()">
    <xsl:copy>
      <xsl:apply-templates select="@* | node()"/>
    </xsl:copy>
  </xsl:template>

  <xsl:template match="/*/*[generate-id() = generate-id(key('child-group', name())[1])]">
      <xsl:copy>
          <xsl:apply-templates select="key('child-group', name())/node()"/>
      </xsl:copy>
  </xsl:template>

  <xsl:template match="/*/*[not(generate-id() = generate-id(key('child-group', name())[1]))]"/>

  <xsl:template match="/*/*/*[contains(local-name(), '_list')][generate-id() = generate-id(key('grand-child-group', name())[1])]">
      <xsl:copy>
          <xsl:apply-templates select="key('grand-child-group', name())/node()"/>
      </xsl:copy>
  </xsl:template>

  <xsl:template match="/*/*/*[contains(local-name(), '_list')][not(generate-id() = generate-id(key('grand-child-group', name())[1]))]"/>  

</xsl:stylesheet>

https://xsltfiddle.liberty-development.net/jyH9rN5

Based on your comment I have also tried to make the XQuery 3 solution recursive:

declare function local:group($elements as element()*) as element()*
{
  for $child-element at $pos in $elements
  let $child-name := node-name($child-element)
  group by $name-group := $child-name, $match := contains(string($child-name), '_list')
  order by $pos[1]
  return
      if ($match)
      then element { $name-group } {
          local:group($child-element/*)
      }
      else if (not($child-element/*))
      then $child-element
      else $child-element/element {$name-group} { local:group(*) }
};

/*/element { node-name(.) } {
    for $child-element at $pos in *
    group by $element-name := node-name($child-element)
    order by $pos[1]
    return element { $element-name } {
         local:group($child-element/*)
    }

}

https://xqueryfiddle.liberty-development.net/pPgCcor/1

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Hello Martin, in your solution, it did not merge the "a_list_three" into one. That means the logic did not percolate to inner "_list" nodes. – Varun Vemuganti Oct 12 '18 at 22:58
  • I really loved the XQUERY solution as well, however it also has the same issue. The inner _list nodes were not merged into one. – Varun Vemuganti Oct 12 '18 at 22:59
  • I understood your requirement "this logic [..] should not apply to inner nodes" as a requirement to not merge elements at a deeper level, both the XQuery as well as the XSLT 1 simply apply the merging/grouping to the child elements of the root elements (for any element name) and to the grand children containing "_list". So you will need to explain your requirements a bit more detailed, whether that is supposed to be a recursive algorithm or what criteria exactly determine what to merge and what not. – Martin Honnen Oct 13 '18 at 05:27
  • @VarunVemuganti, I have added a refinement for the XQuery solution that tries to solve the merging/grouping at deeper levels with a recursive function. – Martin Honnen Oct 13 '18 at 06:05
  • Hi Martin, I tried doing this in XQUERY - 1 using the distinct-values(arg) instead of Group-By, but seems its pretty difficult to achieve this in xquery-1. – Varun Vemuganti Oct 13 '18 at 08:22
  • @VarunVemuganti, it seems the other answer with the XSLT 1 solution helped you to solve that so let's leave this as it is, I won't try to transcribe the posted XQuery 3 into XQuery 1. – Martin Honnen Oct 13 '18 at 10:10
0

Here is a solution for XSLT 1.0

  <xsl:stylesheet version="1.0"
  xmlns:msxml="urn:schemas-microsoft-com:xslt"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  <xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes" omit-xml-declaration="yes"/>

    <xsl:key name="xmlChildren" match="xml/*" use="local-name()"/>
    <xsl:key name="list" match="*[contains(local-name(),'_list')]" use="generate-id(..)"/>

    <!-- Select the child nodes of the xml node. -->
    <xsl:template match="xml/*">
      <!-- Get the name of the current node. -->
      <xsl:variable name="localName" select="local-name()"/>
      <!-- Is this the first child of the xml node with this name? -->
      <xsl:if test="generate-id(.) = generate-id(key('xmlChildren', $localName)[1])">
        <xsl:copy>
          <!-- Output all of the xml grandchild nodes of any xml child node with same name as the current node. -->
          <xsl:apply-templates select="key('xmlChildren', $localName)/*">
              <xsl:with-param name="parentName" select="$localName"/>
          </xsl:apply-templates>
        </xsl:copy>
      </xsl:if>
    </xsl:template>

    <!-- Select the nodes with a local name that contains '_list'. -->
    <xsl:template match="*[contains(local-name(),'_list')]">
      <xsl:param name="parentName"/>

      <xsl:variable name="parentID" select="generate-id(..)"/>

      <!-- Get the name of the current node. -->
      <xsl:variable name="localName" select="local-name()"/>

      <xsl:choose>
        <!-- Is this list a first generation grandchild of xml? -->
        <xsl:when test="parent::*/parent::xml">
          <!-- Is this the first instance of this list? -->
          <xsl:if test="generate-id(.) = generate-id(key('xmlChildren', $parentName)/*[local-name()=$localName][1])">
            <xsl:copy>
              <xsl:apply-templates select="key('xmlChildren', $parentName)/*[local-name()=$localName]/*"/>
            </xsl:copy>
          </xsl:if> 
        </xsl:when>
        <xsl:otherwise>
          <!-- Is this the first instance of this list? -->
          <xsl:if test="generate-id(.) = generate-id(key('list', $parentID)[local-name()=$localName][1])">
            <xsl:copy>
              <xsl:apply-templates select="key('list', $parentID)[local-name() = $localName]/*"/>
            </xsl:copy>
          </xsl:if>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:template>

    <xsl:template match="node()|@*">
      <xsl:copy>
        <xsl:apply-templates select="node()|@*"/>
      </xsl:copy>
    </xsl:template>  

  </xsl:stylesheet>
John Ernst
  • 1,206
  • 1
  • 7
  • 11
  • Martin Honnen's 1.0 solution is basically a copy of this solution. I considered a second template for suppressing empty list. There is no substantive difference in his answer. So, being he copied mine, if you pick a 1.0 solution, I would hope you pick this one. – John Ernst Oct 12 '18 at 18:41
  • Thank you Bluewood66. Your answer is almost near perfect. Coz, there is one small issue I have observed. I introduced a new_element within "a" tag. However, the "new_element" came as-is under "a" tag between duplicate "a" tag. It did not got merged into parent "a" tag. However, if you observe, "element0" got nicely merged. I am expecting the same for new_element as well. Please see this - https://xsltfiddle.liberty-development.net/jyH9rN5/1 – Varun Vemuganti Oct 12 '18 at 22:56
  • You may want to pretty print (format) your output, so it's easier to see what the XSLT produced. Okay, you added the new "a" tag as a child of an "a" element. The program did exactly what it was written to do. It only merges the children of the XML node and sibling lists. At this point, I don't know for sure what your requirement is. But, it is now different than what it started out to be... That said, I made a sincere effort to help you out. Hopefully, you can adjust the code to do what you want. Otherwise, if you want to me to do the work for you on a contract basis, let me know. – John Ernst Oct 13 '18 at 03:03
  • Thank you Bluewood66. I did make a small amendment to the code to solve this purpose. Thank you very much for the help. Warm Regards. – Varun Vemuganti Oct 13 '18 at 07:26