2

I am trying to transform this document but am fairly new to xslt and having tons of fun trying to get it right. The core node(truncated for simplicity) looks like this

<Product prod_id="6352">
    <brandId>221</brandId>
    <brand>Oscar Mayer</brand>
    <images>
       <smallimage>text</simage>
       <medimage>text</medimage>
       <largeimage>text</limage>
    </images>
    <nutrition>
        <nutritionShow>Y</nutritionShow>
        <servingSize>1 SLICE</servingSize>
        <servingsPerContainer>12</servingsPerContainer>
        <totalCalories>60</totalCalories>
        <fatCalories>35</fatCalories>
        <totalFat>4</totalFat>
        <totalFatPercent>6</totalFatPercent>
        <totalFatUnit>g</totalFatUnit>
        <saturatedFat>1.5</saturatedFat>
        <saturatedFatPercent>8</saturatedFatPercent>
        <saturatedFatUnit>g</saturatedFatUnit>
        <transFat>0</transFat>
        <transFatUnit>g</transFatUnit>
        <cholesterolUnit>mg</cholesterolUnit>
    </nutrition>
    <prodId>6352</prodId>
</Product>

In the end I want to sub-nodes that are grouped logically to be a single node with appropriate attribute names.

The end result should look like this

<Product prod_id="6352">
<brandId>221</brandId>
<brand>Oscar Mayer</brand>
<images>
   <smallimage>text</smallimage>
   <medimage>text</medimage>
   <largeimage>text</largeimage>
</images>
<nutrition>
    <nutritionShow>Y</nutritionShow>
    <servingSize>1 SLICE</servingSize>
    <servingsPerContainer>12</servingsPerContainer>
    <totalCalories>60</totalCalories>
    <fatCalories>35</fatCalories>
    <totalFat amount="4" percent="6" unit="g" />
    <saturatedFat amount="1.5" percent="8" unit="g"/>
    <transFat amount="0" unit="g"</>
</nutrition>
<prodId>6352</prodId>

Some key features are

  1. group the similar attributes(notice saturatedFat and transFat ... slightly different)I have a discrete list of these sets. You could use a list or something more dynamic based on relationships but notice the variance.
  2. leave other(non group-able) attributes be
  3. ignore groups that lack the amount attribute/only have unit attribute(notice cholesterol)

Thanks in advance for helping me to understand this fairly complex transformation.

  • "*I have a discreet list of these sets.*" Could you post this list? If it's not too *discreet*, that is... -- Also please state whether using XSLT 1.0 or 2.0. – michael.hor257k Feb 11 '15 at 02:04
  • ba dum cha! I see what you did there. yes autocorrect got me. Luckily I did not tag with grammar but thanks anyway :). I am going to choose to be discreet since the relationships I wanted to express are clear in the example list of attributes that can be grouped and after all i want knowledge not my work done. My intent was that a solution might make use of a list of attributes but I did not want to guide people to the way i was seeing the solution ... as happened I might have ended up with something better and learned something along the way. – Frank Swanson Feb 11 '15 at 14:53
  • "*My intent was that a solution might make use of a list of attributes but I did not want to guide people to the way i was seeing the solution ...*" Actually, that was my thought too, with no guidance from you. I just wanted to see how many you have, and how varied they are. I always tend to be as much explicit as possible with XSLT - even if verbose - and avoid awkward and inefficient expressions of the `*[name() = ...]` type. – michael.hor257k Feb 11 '15 at 18:40
  • The example shows each of the 3 types of attributes. The others are as one would expect ... you see saturated fat also could expct unsaturated and monounsaturated and polyunsaturated There are 5-12 in each category. Categories being 1. amount, unit and percent 2. amount and unit 3. standalone – Frank Swanson Feb 11 '15 at 20:37
  • "*The others are as one would expect ...*" LOL, I wouldn't expect anything - I know bupkis about nutrition... Anyway, I have added my suggestion. BTW, in which category is *cholesterol* in your example? – michael.hor257k Feb 11 '15 at 23:05

2 Answers2

1

One possible solution is following XSLT:

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="html" encoding="UTF-8" indent="yes" />
 <xsl:strip-space elements="*"/>
  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
  </xsl:template>
  <xsl:template match="nutrition/*">
    <xsl:variable name="cName" select="name()"/>
    <xsl:choose>
      <xsl:when test="following-sibling::node()[name()=concat($cName,'Unit')]">
        <xsl:copy>
          <xsl:attribute name="amount">
            <xsl:value-of select="."/>
          </xsl:attribute>
          <xsl:if test="following-sibling::node()[name()=concat($cName,'Percent')]">
            <xsl:attribute name="percent">
              <xsl:value-of select="following-sibling::node()[name()=concat($cName,'Percent')]"/>
            </xsl:attribute>
          </xsl:if>
          <xsl:attribute name="unit">
            <xsl:value-of select="following-sibling::node()[name()=concat($cName,'Unit')]"/>
          </xsl:attribute> 
        </xsl:copy>
      </xsl:when>
      <xsl:when test="contains(name() ,'Unit') or contains(name() ,'Percent')"/>
      <xsl:otherwise>
        <xsl:copy>
          <xsl:apply-templates />
        </xsl:copy>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>
</xsl:stylesheet>

when applied to your input XML produces the ouput

<Product prod_id="6352">
  <brandId>221</brandId>
  <brand>Oscar Mayer</brand>
  <images>
    <smallimage>text</smallimage>
    <medimage>text</medimage>
    <largeimage>text</largeimage>
  </images>
  <nutrition>
    <nutritionShow>Y</nutritionShow>
    <servingSize>1 SLICE</servingSize>
    <servingsPerContainer>12</servingsPerContainer>
    <totalCalories>60</totalCalories>
    <fatCalories>35</fatCalories>
    <totalFat amount="4" percent="6" unit="g"></totalFat>
    <saturatedFat amount="1.5" percent="8" unit="g"></saturatedFat>
    <transFat amount="0" unit="g"></transFat>
  </nutrition>
  <prodId>6352</prodId>
</Product>

The first template is an Identity transform and copies all nodes and attributes without any changes.
The second temmplate matches all child elements/nodes of nutrition.
In case the current element has a following sibling with a local name matching the current local name and ending with Unit

<xsl:when test="following-sibling::node()[name()=concat($cName,'Unit')]">

the current node has to be a node containing the amount.
The value of the current node is written as value of the amount attribute

<xsl:attribute name="amount">
    <xsl:value-of select="."/>
</xsl:attribute>

and in case a following sibling with matching Percent exists

<xsl:if test="following-sibling::node()[name()=concat($cName,'Percent')]">

the Percent attribute is written accordingly:

<xsl:attribute name="percent">
    <xsl:value-of select="following-sibling::node()[name()=concat($cName,'Percent')]"/>
  </xsl:attribute>

Same applies to Unit without previously checking if a matching Unit exists (which could be added if necessary).
The empty

<xsl:when test="contains(name() ,'Unit') or contains(name() ,'Percent')"/>

removes the Unit and Percent nodes that has been written as attributes as well as the cholesterolUnit.
Finally, all other non groupable nutrition elements are just copied:

<xsl:otherwise>
  <xsl:copy>
    <xsl:apply-templates/>
  </xsl:copy>
</xsl:otherwise> 
matthias_h
  • 11,356
  • 9
  • 22
  • 40
  • WOW, that is some true badassery right there ... Respect. Thank you so much for the thorough explanation. That really helps people to learn! Not the way i was imagining a solution but this is very smooth. – Frank Swanson Feb 11 '15 at 14:44
0

continuing from the comments...

The example shows each of the 3 types of attributes. The others are as one would expect ... you see saturated fat also could expct unsaturated and monounsaturated and polyunsaturated There are 5-12 in each category. Categories being 1. amount, unit and percent 2. amount and unit 3. standalone

Personally, I prefer to spell things out as far as they are known, so for the given example:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:strip-space elements="*"/>

<!-- identity transform -->
<xsl:template match="@*|node()">
    <xsl:copy>
        <xsl:apply-templates select="@*|node()"/>
    </xsl:copy>
</xsl:template>


<!-- category #1: amount, unit and percent -->
<xsl:template match="totalFat">
    <totalFat amount="{.}" percent="{../totalFatPercent}" unit="{../totalFatUnit}" />
</xsl:template>

<xsl:template match="saturatedFat">
    <saturatedFat amount="{.}" percent="{../saturatedFatPercent}" unit="{../saturatedFatUnit}" />
</xsl:template>


<!-- category #2: amount and percent -->
<xsl:template match="transFat">
    <transFat amount="{.}" unit="{../transFatUnit}" />
</xsl:template>


<!-- suppress all units and percents -->
<xsl:template match="totalFatPercent | totalFatUnit | saturatedFatPercent | saturatedFatUnit | transFatUnit | cholesterolUnit | cholesterolPercent"/>

</xsl:stylesheet>

Note that Category #3 is handled by the identity transform template and requires no exception.


Note also that items that are known to appear in every product do not require a template of their own; you could just write them out as literal result elements within a template matching nutrition and add their names to the suppressing empty template.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • This is a good solution too and much more in line with the approach I was targeting. I like your coding style thanks for sharing. I will give this some testing from a performance perspective. You mentioned some efficiency benefits to this solution. I would be grateful if you could expand on that a little, I am always thanks. I will follow up with the results of my findings on testing the 2 approaches this weekend. – Frank Swanson Feb 13 '15 at 16:50
  • It's dangerous to speak of efficiency unless you know how your specific processor was designed (or at least how it behaves, based on actual performance tests). Still, I believe that an explicit reference to a node by name, e.g. `select="mynode"` will be faster than an implicit one, e.g. `select="*[name()='mynode']"` or even worse `select="*[name()='concat('my', 'node')']"` etc. And performance aside, the explicit code is more readable and more manageable, IMHO. – michael.hor257k Feb 14 '15 at 00:58