8

I have the following xml

<smses>
  <sms address="87654321" type="1" body="Some text" readable_date="3/09/2011 2:16:52 PM" contact_name="Person1" />
  <sms address="87654321" type="2" body="Some text" readable_date="3/09/2011 2:36:41 PM" contact_name="Person1" />
  <sms address="87654321" type="1" body="Some text" readable_date="3/09/2011 2:16:52 PM" contact_name="Person1" />
  <sms address="123" type="2" body="Some text" readable_date="3/09/2011 10:56:24 AM" contact_name="Person2" />
  <sms address="123" type="1" body="Some text" readable_date="3/09/2011 10:57:52 AM" contact_name="Person2" />
  <sms address="123" type="2" body="Some text" readable_date="3/09/2011 10:56:24 AM" contact_name="Person2" />
  <sms address="12345678" type="1" body="Some text" readable_date="3/09/2011 11:21:16 AM" contact_name="Person3" />
  <sms address="12345678" type="2" body="Some text" readable_date="3/09/2011 11:37:21 AM" contact_name="Person3" />

  <sms address="12345" type="2" body="Some text" readable_date="28/01/2011 7:24:50 PM" contact_name="(Unknown)" />
  <sms address="233" type="1" body="Some text" readable_date="30/12/2010 1:13:41 PM" contact_name="(Unknown)" />
</smses>

I am trying to get an ouput like this (e.g. xml)

<sms contact_name="person1">
    <message type="1">{@body}</message>
    <message type="2">{@body}</message>
    <message type="1">{@body}</message>
</sms>
<sms contact_name="person2">
    <message type="2">{@body}</message>
    <message type="1">{@body}</message>
</sms>
<sms contact_name="person3">
    <message type="2">{@body}</message>
    <message type="1">{@body}</message>
</sms>
<sms contact_name="(Unknown)">
    <message type="2">{@body}</message>
    <message type="1">{@body}</message>
</sms>
<sms contact_name="(Unknown)">
    <message type="2">{@body}</message>   
</sms>

e.g. html

<div>
  <h1>Person: @contact_name (@address)</h1>
  <p>message @type: @body</p>
</div>

I have managed to do this with the following XSLT code (please excuse the code below does not reflect the html entirely, the output is the desired result!)

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="yes" />
    <xsl:key name="txt" match="sms" use="@contact_name" />
    <xsl:template match="smses">
        <xsl:apply-templates select="sms[generate-id(.)=generate-id(key('txt', @contact_name)[1])]">
            <xsl:sort select="@address" order="ascending" />
        </xsl:apply-templates>
    </xsl:template>
    <xsl:template match="sms">
        <h4><xsl:value-of select="@contact_name"  /></h4>
            <xsl:for-each select="key('txt', @contact_name)">
                    <br />
                    <xsl:value-of select="@body" />
            </xsl:for-each>
    </xsl:template>

</xsl:stylesheet>

The problem I have is, or rather the question I'm asking. I have a sms element with a @contact_name attribute that is "(unknown)" but the @address is unique between both elements, i.e. they should not be grouped together, because the sms message came from a different number/person (even though the contact name is the same, its irrelevant). Should I be trying to reorder/change the XML data or is there a way to get XSLT to recognise the group for unknown should check if the @address is different if the @contact_name is the same.

Edit:

I failed to mention (or rather forgot) that while there are some sms messages with same @contact_name and unique @address there is also cases where some of the @address fields have slight discrepancy where they don't have the country code in front of the number, e.g.

<sms contact_name="jared" address="12345" />
<sms contact_name="jared" address="+64112345" />

But they are meant to be grouped because they are from the same person/number.

Edit:

In my situation there would only be discrepancies of having 3 character (e.g. +64) country code plus 2 digit network code (e.g. 21). Basically the outcome should be, if @contact_name = same and @address is completely different i.e.

 <sms contact_name="jared" address="12345" />
 <sms contact_name="jared" address="5433467" />

then they should be seperate elements, as they are from different people/number(s).

if @contact_name = same and @address is different only by country and network codes i.e.

 <sms contact_name="jared" address="02112345" />
 <sms contact_name="jared" address="+642112345" />

then they should be grouped as they are from the same person/number

Edit:

country codes: +64 (3 characters)

network codes: 021 (3 characters, usually last character changes depending on network)

Numbers (@address) get saved per <sms> either as +64-21-12345 (excluding dashes) or 021-12345(excluding dash).

Jared
  • 402
  • 6
  • 22
  • Good question, +1. Now you will be able to learn and apply Muenchian grouping using composite keys. – Dimitre Novatchev Sep 14 '11 at 03:34
  • @_Jared: You need to explain (better by editing the question) the rules for prefixing with country code: Isit 2digits only or three digits, or varying number of digits? In case it is the latter, then the solution should have been provided with a list of all possible country codes. – Dimitre Novatchev Sep 14 '11 at 21:32
  • @_Dimitre - Apologies, hope I've made it more clear now. I was so close to getting this working on my own until I hit this barrier. Much appreciate your help! – Jared Sep 14 '11 at 22:11
  • Can you provide a list of country/network codes or tell us at least the number of digits a code is composed of? – Emiliano Poggi Sep 14 '11 at 23:15
  • @_Empo Did I not make that clear? perhaps not, I'll add a list if that helps. Thanks. – Jared Sep 14 '11 at 23:23
  • @_Jared: Please see my solution to your latest-updated problem. – Dimitre Novatchev Sep 15 '11 at 02:07

1 Answers1

12

This transformation uses Muenchian grouping with composite keys:

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kContactByNameAddress" match="sms"
          use="concat(@contact_name,'+',@address)"/>

 <xsl:template match=
    "sms[generate-id()
        =
         generate-id(key('kContactByNameAddress',
                         concat(@contact_name,'+',@address)
                        )
                         [1]
                     )
        ]
    ">
     <sms contact_name="{@contact_name}">
       <xsl:apply-templates mode="inGroup"
       select="key('kContactByNameAddress',
                 concat(@contact_name,'+',@address)
                )"/>
     </sms>
 </xsl:template>

 <xsl:template match="sms" mode="inGroup">
       <message type="{@type}">
         <xsl:value-of select="@body"/>
       </message>
 </xsl:template>
 <xsl:template match="sms"/>
</xsl:stylesheet>

When applied to the provided XML document:

<smses>
    <sms address="87654321" type="1" body="Some text"
    readable_date="3/09/2011 2:16:52 PM" contact_name="Person1" />
    <sms address="87654321" type="2" body="Some text"
    readable_date="3/09/2011 2:36:41 PM" contact_name="Person1" />
    <sms address="87654321" type="1" body="Some text"
    readable_date="3/09/2011 2:16:52 PM" contact_name="Person1" />
    <sms address="123" type="2" body="Some text"
    readable_date="3/09/2011 10:56:24 AM" contact_name="Person2" />
    <sms address="123" type="1" body="Some text"
    readable_date="3/09/2011 10:57:52 AM" contact_name="Person2" />
    <sms address="123" type="2" body="Some text"
    readable_date="3/09/2011 10:56:24 AM" contact_name="Person2" />
    <sms address="12345678" type="1" body="Some text"
    readable_date="3/09/2011 11:21:16 AM" contact_name="Person3" />
    <sms address="12345678" type="2" body="Some text"
    readable_date="3/09/2011 11:37:21 AM" contact_name="Person3" />
    <sms address="12345" type="2" body="Some text"
    readable_date="28/01/2011 7:24:50 PM" contact_name="(Unknown)" />
    <sms address="233" type="1" body="Some text"
    readable_date="30/12/2010 1:13:41 PM" contact_name="(Unknown)" />
</smses>

the wanted, correct result is produced:

<sms contact_name="Person1">
   <message type="1">Some text</message>
   <message type="2">Some text</message>
   <message type="1">Some text</message>
</sms>
<sms contact_name="Person2">
   <message type="2">Some text</message>
   <message type="1">Some text</message>
   <message type="2">Some text</message>
</sms>
<sms contact_name="Person3">
   <message type="1">Some text</message>
   <message type="2">Some text</message>
</sms>
<sms contact_name="(Unknown)">
   <message type="2">Some text</message>
</sms>
<sms contact_name="(Unknown)">
   <message type="1">Some text</message>
</sms>

Update: The OP has edited his question and has posted new requirements that the address attribute may or maynot start with a country code. Two addresses, one with contry code and the other without country code are "the same" if the substring after the country code is equal to the other address. In this case the two elements should be grouped together.

Here is the solution (it would be trivial to write in XSLT 2.0, but in XSLT 1.0 to do so in a single pass is quite tricky. Amultipass solution is more easy, but it would generally require the xxx:node-set() extension function and would thus lose portability):

<xsl:stylesheet version="1.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:key name="kContactByNameAddress" match="sms"
  use="concat(@contact_name,'+',
              concat(substring(@address,
                               4 div starts-with(@address,'+')),
                     substring(@address,
                               1 div not(starts-with(@address,'+'))
                              )
                     )
              )"/>

 <xsl:template match=
    "sms[generate-id()
        =
         generate-id(key('kContactByNameAddress',
                         concat(@contact_name,'+',
                                concat(substring(@address,
                                                 4 div starts-with(@address,'+')),
                                       substring(@address,
                                                 1 div not(starts-with(@address,'+'))
                                                 )
                                       )
                                 )
                         )
                         [1]
                     )
        ]
    ">
     <sms contact_name="{@contact_name}">
       <xsl:apply-templates mode="inGroup"
       select="key('kContactByNameAddress',
                 concat(@contact_name,'+',
                        concat(substring(@address,
                                         4 div starts-with(@address,'+')),
                               substring(@address,
                                         1 div not(starts-with(@address,'+'))
                                         )
                                )
                        )
                  )
      "/>
     </sms>
 </xsl:template>

 <xsl:template match="sms" mode="inGroup">
       <message type="{@type}">
         <xsl:value-of select="@body"/>
       </message>
 </xsl:template>
 <xsl:template match="sms"/>
</xsl:stylesheet>

When this transformation is applied on the following XML document (the previous one + added three sms elements with contact_name="Jared", two of which have "identical" addresses, according to the newly posted rules):

<smses>
    <sms address="87654321" type="1" body="Some text"
        readable_date="3/09/2011 2:16:52 PM" contact_name="Person1" />
    <sms address="87654321" type="2" body="Some text"
        readable_date="3/09/2011 2:36:41 PM" contact_name="Person1" />
    <sms address="87654321" type="1" body="Some text"
        readable_date="3/09/2011 2:16:52 PM" contact_name="Person1" />
    <sms address="123" type="2" body="Some text"
        readable_date="3/09/2011 10:56:24 AM" contact_name="Person2" />
    <sms address="123" type="1" body="Some text"
        readable_date="3/09/2011 10:57:52 AM" contact_name="Person2" />
    <sms address="123" type="2" body="Some text"
        readable_date="3/09/2011 10:56:24 AM" contact_name="Person2" />
    <sms address="12345678" type="1" body="Some text"
        readable_date="3/09/2011 11:21:16 AM" contact_name="Person3" />
  <sms contact_name="jared" address="12345" type="2" body="Some text"/>
  <sms contact_name="jared" address="56789" type="1" body="Some text"/>
  <sms contact_name="jared" address="+6412345" type="2" body="Some text"/>
    <sms address="12345678" type="2" body="Some text"
        readable_date="3/09/2011 11:37:21 AM" contact_name="Person3" />
    <sms address="12345" type="2" body="Some text"
        readable_date="28/01/2011 7:24:50 PM" contact_name="(Unknown)" />
    <sms address="233" type="1" body="Some text"
        readable_date="30/12/2010 1:13:41 PM" contact_name="(Unknown)" />
</smses>

the wanted, correct result is produced:

<sms contact_name="Person1">
   <message type="1">Some text</message>
   <message type="2">Some text</message>
   <message type="1">Some text</message>
</sms>
<sms contact_name="Person2">
   <message type="2">Some text</message>
   <message type="1">Some text</message>
   <message type="2">Some text</message>
</sms>
<sms contact_name="Person3">
   <message type="1">Some text</message>
   <message type="2">Some text</message>
</sms>
<sms contact_name="jared">
   <message type="2">Some text</message>
   <message type="2">Some text</message>
</sms>
<sms contact_name="jared">
   <message type="1">Some text</message>
</sms>
<sms contact_name="(Unknown)">
   <message type="2">Some text</message>
</sms>
<sms contact_name="(Unknown)">
   <message type="1">Some text</message>
</sms>

Detailed explanation:

The main difficulty in this problem arises from the fact that there is no "if... then ... else" operator in XPath 1.0, however we must specify a single XPath expression in the use attribute of the xsl:key instruction, that either selects the address attribute (when it doesn't start with "+") or its substring after the country code (if its string value starts with "+").

Here I am using this poor man's implementation of

if($condition)
  then $string1
  else $string2

The following XPath expression, when evaluated is equivalent to the above:

concat(substring($string1, 1 div $condition),
       substring($string2, 1 div not($condition))
      )

This equivalence follows from the fact that 1 div true() is the same as 1 div 1 and this is 1, while 1 div false() is the same as 1 div 0 and that is the number (positive) Infinity.

Also, for any string $s, the value of substring($s, Infinity) is just the empty string. And, of course, for any string $s the value of substring($s, 1) is just the string $s itself.

II. XSLT 2.0 solution:

<xsl:stylesheet version="2.0"
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 <xsl:output omit-xml-declaration="yes" indent="yes"/>
 <xsl:strip-space elements="*"/>

 <xsl:template match="/*">
  <xsl:for-each-group select="sms" group-by=
   "concat(@contact_name,'+',
           if(starts-with(@address,'+'))
             then substring(@address, 4)
             else @address
           )">
     <sms contact_name="{@contact_name}">
      <xsl:apply-templates select="current-group()"/>
     </sms>

  </xsl:for-each-group>
 </xsl:template>

 <xsl:template match="sms">
       <message type="{@type}">
         <xsl:value-of select="@body"/>
       </message>
 </xsl:template>
</xsl:stylesheet>

when this (much simpler!)XSLT 2.0 transformation is applied on the same XML document (above), the same correct output is produced:

<sms contact_name="Person1">
   <message type="1">Some text</message>
   <message type="2">Some text</message>
   <message type="1">Some text</message>
</sms>
<sms contact_name="Person2">
   <message type="2">Some text</message>
   <message type="1">Some text</message>
   <message type="2">Some text</message>
</sms>
<sms contact_name="Person3">
   <message type="1">Some text</message>
   <message type="2">Some text</message>
</sms>
<sms contact_name="jared">
   <message type="2">Some text</message>
   <message type="2">Some text</message>
</sms>
<sms contact_name="jared">
   <message type="1">Some text</message>
</sms>
<sms contact_name="(Unknown)">
   <message type="2">Some text</message>
</sms>
<sms contact_name="(Unknown)">
   <message type="1">Some text</message>
</sms>
Dimitre Novatchev
  • 240,661
  • 26
  • 293
  • 431
  • You are very knowledgeable! Thanks so much for answering. Alas I think I have blundered my examples, most apologies for that! I have edited my question to update this. – Jared Sep 14 '11 at 20:31
  • Dude that was awesome, and an explanation on what its doing was epic. Thanks so much, I think I can take it from here :) – Jared Sep 15 '11 at 03:03
  • @Jared: You are welcome. The solution can be significantly simpler if there is additional information in the problem -- for example if it is known that all phone numbers (without the country code) have the same, fixed length. If this is so in your case, please, confirm and I'd be glad to post the simpler solution. – Dimitre Novatchev Sep 15 '11 at 03:11
  • unfortunately, most numbers could range from 7 (not including country or network codes) to 8, then you get network specific messages or competition numbers, which are usually about 3-4 characters. What I ended up doing was changing the else condition of the substring to 2 div, which picks up the '021' and the 4 picks up the '+642'. I think I'm good now though, I put my proper XML data in and it came out lovely. Would be nice to use XSLT 2.0, will have to investigate that myself. Thanks for your help, once again! – Jared Sep 15 '11 at 03:54