5

Is it possible to define a custom format for <xsl:number>?

I have the case where a standard alpha-based format is desired, but certain characters in the alphabet are forbidden (strange requirement, but it is what the client requires). For example, the letter i cannot be used, so when using <xsl:number> I should get the sequence: a, b, c, d, e, f, g, h, j, k, ..., aa, ab, ..., ah, aj, ...

The project is using XSLT 2.0 and Saxon, so if a solution exists that is specific to Saxon, that is okay.

Does XSLT 2.0 provide the capability to define a custom format sequence? Does Saxon provide a capability to register a custom sequence for use with <xsl:number>?

ewh
  • 1,004
  • 9
  • 19

4 Answers4

3

XSLT 2.0 provides the format attribute for xsl:number by which you can use the format token aa for example. The computed number depends by the expression evaluated inside value attribute and will be formatted accordingly to format.

Given this, you can think of first evaluating the correct sequence of numbers excluding those that will match for a particular letter.

For instance, the following instruction:

  <xsl:number value="$sequence" format="aa"/>

will print (notice i excluded):

 a.b.c.d.e.f.g.h.j.k.l.m

if $sequence evaluates to (notice 9 skipped):

1, 2, 3, 4, 5, 6, 7, 8, 10, 11, 12, 13

Notice that if you have 12 elements your expression should be able to skip the unwanted number (9 for i) and increase the following of one. The last element with position 12, should have corresponding number 13.

So what you need, is just the algorithm that computes the wanted sequence; which depends definitely from your input document.

References: XSLT 2.0 Rec.

Emiliano Poggi
  • 24,390
  • 8
  • 55
  • 67
  • i like the basic approach, but will have to think about how to generate the sequence correctly. The letters to exclude are independent of the input. For example, the letters `i`, `l`, and `o` cannot occur in the resulting ``, including when number goes into double characters. Not sure how simple the algorithm is to map the raw sequence number to the effective sequence number. – ewh Jul 06 '11 at 06:37
  • For example, to exclude `i` in double letters also, you should skip number 34 (9 + 26). You can post a sample of the XML to be processed and some expert here will like to help you. – Emiliano Poggi Jul 06 '11 at 06:57
  • The sequence is general: x number of items that need to be labeled like an ordered list, but with the labels excluding specific alpha characters. The "skipping" is not as simple as you think. `z` represents item 23. Item 34 would have the label `am`. The missing letters increases the shift every 23 items, but with the real shift varied between cycles since there are 3 letters that I need to exclude. See my answer to my general problem, but it does not use ``. – ewh Jul 06 '11 at 07:04
1

EDIT: An alternate, more general, solution exists and is posted as a separate answer. I'm leaving this answer since it still may be of value to some.

I like @empo's thinking (I mod'ed it up), but I think it may be hard to get a working solution. A clever algorithm/equation is required to come up with the correct sequence number based on the raw sequence to avoid getting a label that does not contain the forbidden characters. At this time, such an algorithm escapes me.

One method I came up with is to create my own function, and not use <xsl:number>. In essence, we are dealing with a base 23 set, the letters a to z, but excluding the characters i, l, and o. The function I came up with only goes up to zz, but that should be sufficient for what is needed (provides labelling up to 552 items).

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:ewh="http://www.earlhood.com/XSL/Transform"
                exclude-result-prefixes="#all">

<xsl:output method="xml" indent="yes"/>

<xsl:variable name="letters" select="'abcdefghjkmnpqrstuvwxyz'"/>
<xsl:variable name="lbase" select="23"/>

<xsl:function name="ewh:get-alpha-label" as="xs:string">
  <xsl:param name="number" as="xs:integer"/>
  <xsl:variable name="quotient" select="$number idiv $lbase"/>
  <xsl:variable name="remainder" select="$number mod $lbase"/>
  <xsl:variable name="p1">
    <xsl:choose>
      <xsl:when test="($quotient gt 0) and ($remainder = 0)">
        <xsl:value-of select="substring($letters,($quotient - 1),1)"/>
      </xsl:when>
      <xsl:when test="($quotient gt 0) and ($remainder gt 0)">
        <xsl:value-of select="substring($letters,$quotient,1)"/>
      </xsl:when>
      <xsl:otherwise/>
    </xsl:choose>
  </xsl:variable>
  <xsl:variable name="p0">
    <xsl:choose>
      <xsl:when test="$remainder = 0">
        <xsl:value-of select="substring($letters,$lbase,1)"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:value-of select="substring($letters,$remainder,1)"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
  <xsl:value-of select="concat($p1,$p0)"/>
</xsl:function>

<xsl:template match="/">
  <result>
    <value n="9"><xsl:value-of select="ewh:get-alpha-label(9)"/></value>
    <value n="12"><xsl:value-of select="ewh:get-alpha-label(12)"/></value>
    <value n="15"><xsl:value-of select="ewh:get-alpha-label(15)"/></value>
    <value n="23"><xsl:value-of select="ewh:get-alpha-label(23)"/></value>
    <value n="26"><xsl:value-of select="ewh:get-alpha-label(26)"/></value>
    <value n="33"><xsl:value-of select="ewh:get-alpha-label(33)"/></value>
    <value n="46"><xsl:value-of select="ewh:get-alpha-label(46)"/></value>
    <value n="69"><xsl:value-of select="ewh:get-alpha-label(69)"/></value>
    <value n="70"><xsl:value-of select="ewh:get-alpha-label(70)"/></value>
    <value n="200"><xsl:value-of select="ewh:get-alpha-label(200)"/></value>
    <value n="552"><xsl:value-of select="ewh:get-alpha-label(552)"/></value>
  </result>
</xsl:template>

</xsl:stylesheet>

When I execute the above, I get the following output:

<result>
   <value n="9">j</value>
   <value n="12">n</value>
   <value n="15">r</value>
   <value n="23">z</value>
   <value n="26">ac</value>
   <value n="33">ak</value>
   <value n="46">az</value>
   <value n="69">bz</value>
   <value n="70">ca</value>
   <value n="200">hs</value>
   <value n="552">zz</value>
</result>

It would be nice of XSLT provided the capability to define a custom character sequence for use with <xsl:number>. Seems like such a capability would generalize <xsl:number> w/o relying on custom extensions, which I do not know if any XSLT engine provides for <xsl:number>.

ewh
  • 1,004
  • 9
  • 19
  • I've found a numerical solution applicable to any number of items and easily integrated with `xsl:number` (as indicated in my answer). Are you still interested in this kind of solution? – Emiliano Poggi Jul 09 '11 at 16:19
  • yes I'm interested. I have actually come up with a different solution that has no numerical limit. Took a little time for me to handle the boundary cases properly, especially for larger numbers. I can hold on posting it until you have posted yours in case we came up with similar general solution. – ewh Jul 10 '11 at 03:02
  • ok, I'll post it as another answer when I have 5 minutes free. – Emiliano Poggi Jul 10 '11 at 06:45
  • since it has been a little while, I posted my generalized solution to the problem. @empo, feel free to post your solution whenver you have time, especially if it is different and/or better than what I came up with. – ewh Jul 14 '11 at 12:20
  • I've undervalued the complexity of this problem. I've tried to get a pure numerical solution without success. I should dedicate much more time to achieve a good result I think. Your generalized answer is certainly good enough if you are sure that it computes for the correct labels (that's also not easy to check :)). However, I would like to post the numerical alternative in the future and also see someone posting one. – Emiliano Poggi Jul 14 '11 at 12:28
1

You can customize the output of xsl:number in Saxon by writing an implementation of the interface net.sf.saxon.lib.Numberer: probably you will want to make this a subclass of net.sf.saxon.expr.number.Numberer_en. You'll need to study the source code and work out what needs overriding.

In Saxon PE/EE you can register the Numberer to be used for a given language in the Saxon configuration file. For Saxon HE it requires a bit more work: you have to implement the interface LocalizerFactory and register your LocalizerFactory with the Configuration.

Michael Kay
  • 156,231
  • 11
  • 92
  • 164
1

I came up with the following, more generalized solution, after posting my original solution to the problem. The solution is pure XSLT and at the base, still uses <xsl:number>, so should be applicable to any format type.

<xsl:stylesheet version="2.0"
                xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
                xmlns:xs="http://www.w3.org/2001/XMLSchema"
                xmlns:ewh="http://www.earlhood.com/XSL/Transform"
                exclude-result-prefixes="#all">

<!-- Description: XSLT to generate a alpha formatted sequence
     label (via <xsl:number>), but disallowing specific characters
     from being used.
  -->

<!-- Algorithm: Given the index value of the item to generate
     a label for via <xsl:number>, we adjust the value so the resulting
     label avoids the use of the forbidden characters.

     This is achieved by converting the index value into a baseX
     number, with X the number of allowed characters.

     The baseX number will be converted into a reverse sequence
     of numbers for each ^E place.  For example, the number 12167
     converted to base23 will generate the following reverse sequence:

       Place:    (23^0, 23^1, 23^2, 23^3)
       Sequence: (   0,    0,    0,    1)   // 1000 in base23

     Having it in right-to-left order makes processing easier.

     Each item in the sequence will be a number from 0 to baseX-1.

     With the sequence, we can then just call <xsl:number> on
     each item and reverse concatenate the result.

     NOTE: Since <xsl:number> does not like 0 as a given value,
     the sequence must be processed so each item is within the
     range of 1-to-baseX.  For example, the above base23 example
     will be translated to the following:

       (23, 22, 22)
  -->

<xsl:output method="xml" indent="yes"/>

<!-- Number of allowed characters: This should be total number of chars of
     format-type desired minus the chars that should be skipped. -->
<xsl:variable name="lbase" select="23"/>
<!-- Sequence of character positions not allowed, with 1=>a to 26=>z -->
<xsl:variable name="lexcs" select="(9,12,15)"/> <!-- i,l,o -->

<!-- Helper Function:
     Convert integer to sequence of number of given base.
     The sequence of numbers is in reverse order: ^0,^1,^2,...^N.
  -->
<xsl:function name="ewh:get_base_digits" as="item()*">
  <xsl:param name="number" as="xs:integer"/>
  <xsl:param name="to"     as="xs:integer"/>
  <xsl:variable name="Q" select="$number idiv $to"/>
  <xsl:variable name="R" select="$number mod $to"/>
  <xsl:sequence select="$R"/>
  <xsl:if test="$Q gt 0">
    <xsl:sequence select="ewh:get_base_digits($Q,$to)"/>
  </xsl:if>
</xsl:function>

<!-- Helper Function:
     Compute carry-overs in reverse-base digit sequence.  XSLT starts
     numbering at 1, so we cannot have any 0s.
  -->
<xsl:function name="ewh:compute_carry_overs" as="item()*">
  <xsl:param name="digits" as="item()*"/>
  <xsl:variable name="d" select="subsequence($digits, 1, 1)"/>
  <xsl:choose>
    <xsl:when test="($d le 0) and (count($digits) = 1)">
      <!-- 0 at end of list, nothing to do -->
    </xsl:when>
    <xsl:when test="$d le 0">
      <!-- If digit <=0, need to perform carry-over operation -->
      <xsl:variable name="next" select="subsequence($digits, 2, 1)"/>
      <xsl:choose>
        <xsl:when test="count($digits) le 2">
          <xsl:sequence select="$lbase + $d"/>
          <xsl:sequence select="ewh:compute_carry_overs($next - 1)"/>
        </xsl:when>
        <xsl:otherwise>
          <xsl:sequence select="$lbase + $d"/>
          <xsl:sequence select="ewh:compute_carry_overs(($next - 1,
              subsequence($digits, 3)))"/>
        </xsl:otherwise>
      </xsl:choose>
    </xsl:when>
    <xsl:when test="count($digits) le 1">
      <xsl:sequence select="$d"/>
    </xsl:when>
    <xsl:otherwise>
      <xsl:sequence select="$d"/>
      <xsl:sequence select="ewh:compute_carry_overs(subsequence($digits, 2))"/>
    </xsl:otherwise>
  </xsl:choose>
</xsl:function>

<!-- Helper Function:
     Given a number in the base range, determine number for
     purposes of <xsl:number>.  We loop thru the exclusion
     list and add 1 for each exclusion letter that has
     been passed.  The $digit parameter should be a number
     in the range [1..$lbase].
  -->
<xsl:function name="ewh:compute_digit_offset" as="xs:integer">
  <xsl:param name="digit"      as="xs:integer"/>
  <xsl:param name="excludes"   as="item()*"/>
  <xsl:variable name="l" select="subsequence($excludes, 1, 1)"/>
  <xsl:variable name="result">
    <xsl:choose>
      <xsl:when test="$digit lt $l">
        <xsl:value-of select="0"/>
      </xsl:when>
      <xsl:when test="count($excludes) = 1">
        <xsl:value-of select="1"/>
      </xsl:when>
      <xsl:otherwise>
        <xsl:variable name="rest">
          <xsl:value-of select="ewh:compute_digit_offset($digit+1,
              subsequence($excludes,2))"/>
        </xsl:variable>
        <xsl:value-of select="1 + $rest"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:variable>
  <xsl:value-of select="$result"/>
</xsl:function>

<!-- Retrieve alpha sequence label.
     This is the main function to call.
  -->
<xsl:function name="ewh:get-alpha-label" as="xs:string">
  <xsl:param name="number" as="xs:integer"/>
  <xsl:variable name="basedigits"
                select="ewh:get_base_digits($number,$lbase)"/>
  <xsl:variable name="digits"
                select="ewh:compute_carry_overs($basedigits)"/>
  <xsl:variable name="result" as="item()*">
    <xsl:for-each select="$digits">
      <xsl:variable name="digit" select="."/>
      <!-- Should not have any 0 values.  If some reason we do,
           we ignore assuming they are trailing items. -->
      <xsl:if test="$digit != 0">
        <xsl:variable name="value">
          <xsl:value-of select="$digit +
              ewh:compute_digit_offset($digit,$lexcs)"/>
        </xsl:variable>
        <xsl:variable name="number">
          <xsl:number value="$value" format="a"/>
        </xsl:variable>
        <xsl:sequence select="$number"/>
      </xsl:if>
    </xsl:for-each>
  </xsl:variable>
  <xsl:value-of select="string-join(reverse($result),'')"/>
</xsl:function>

<!-- For testing -->
<xsl:template match="/">
  <result>
    <xsl:for-each select="(1 to 1000,12166,12167,12168,279840,279841,279842)">
      <value n="{.}"><xsl:value-of select="ewh:get-alpha-label(.)"/></value>
    </xsl:for-each>
  </result>
</xsl:template>

</xsl:stylesheet>
ewh
  • 1,004
  • 9
  • 19
  • I too tried to come up with a generalized mathematical solution, but failed. I could come up with somethat that may work for awhile, but for larger numbers, the results were wrong. Trying to figure out how to deal with extra shifting at each 23^N place escapes me at this time, hence the current solution. – ewh Jul 15 '11 at 15:44
  • The closest mathematical solution I found was recursive, and given a big number failed by overflowing. May be it was not right, or may be recursion is not the way to go. – Emiliano Poggi Jul 15 '11 at 19:13