2

I use XSLT 3.0, Saxon-PE 9.7.

I need to sort orth according to the Ugaritic language, close to Hebrew but with additional characters.

I have tried:

 <xsl:sort select="orth" data-type="text" order="ascending" lang="uga"/>

But the proposed order is not correct. So I think I need to describe the Ugaritic alphabetic order. How can I do?

In advance, thank you very much.

Rob
  • 14,746
  • 28
  • 47
  • 65
Vanessa
  • 121
  • 12
  • I think the section http://saxonica.com/html/documentation9.7/extensibility/config-extend/collation/implementing-collation.html in the Saxon 9.7 documentation is relevant. – Martin Honnen Jan 24 '18 at 14:05
  • Thanks @Martin. I tried to look `CollationURIResolver`. I suppose you are talking about `startsWith` (http://saxonica.com/html/documentation9.7/javadoc/net/sf/saxon/lib/SubstringMatcher.html#startsWith(java.lang.String,%20java.lang.String). I did a test for one letter (``), but it doesn't work. – Vanessa Jan 24 '18 at 17:28
  • I don't think you have understood what that configuration is about, you would need to declare your ordering rules in a Java class implementing java.util.Comparator or in a Saxon configuration file. Once you have done that, you can use the `collation` attribute on `xsl:sort` with e.g. `collation="http://saxon.sf.net/collation?class=yourFullClassHere"`, the `select` attribute would remain as `select="orth"`. So the task is to be solved outside of the XSLT code, by writing up the collation rules for that alphabet. – Martin Honnen Jan 24 '18 at 18:41
  • So which characters compose that language, is that https://en.wikipedia.org/wiki/Ugaritic_alphabet? Are those characters not ordered by their Unicode code point? – Martin Honnen Jan 24 '18 at 18:43
  • Thanks @Martin. Regarding Ugaritic, I'm using the transcription, not the cuneiform sign. Regarding the Saxon link, sorry if I didn't understand, it's not really easy for a neophyte, and especially when English is not the first language. I did look at https://www.oxygenxml.com/InstData/Editor/SDK/javadoc/ro/sync/contentcompletion/xml/CIElement.html and to https://www.oxygenxml.com/doc/versions/19.1/ug-editor/search.html?searchQuery=saxon+configuration+file but this is really too complicated much for me. I don't know `java`. There is no other way to do it in `XSLT`? – Vanessa Jan 24 '18 at 19:14
  • So what determines the order, if we use `translate(orth, 'list of transcription characters', 'lit of cuneiform signs')`, would that sort fine based on the Unicode code points of the cuneiform signs (i.e. the order ``)? – Martin Honnen Jan 24 '18 at 19:57
  • thanks @Martin for your suggestion, it looks very nice :) but as I wrote, I'm no using cuneiform signs, but rather transcription (ʾ, B, G, D, Ḏ, H, W, Z, Ḍ, Ḫ,Ṭ, Ẓ, Y, K, L, M, N, S, Ś, ʿ, Ġ, P, Ṣ, Q, R, Š, T, Ṯ). Of course, if I add `lang='uga'`, it doesn't work because it is the transcription. I tried other ISO from Northwest Semitic languages, but it doesn't work either. I suppose it is because I use transcription of cuneiform signs. – Vanessa Jan 24 '18 at 21:26
  • What determines the sort order you want? You haven't explained that anywhere so far and I am afraid people here don't know, nor is it likely that an XSLT processor has some predefined collation. – Martin Honnen Jan 24 '18 at 21:32
  • @Martin. I wrote in my first message that I was looking to sort in Ugaritic alphabetic order (as we traditionally use). However, I forgot to write that it was the transcription and not the cuneiform. – Vanessa Jan 24 '18 at 21:36
  • Let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/163831/discussion-between-vanessa-and-martin-honnen). – Vanessa Jan 24 '18 at 21:37

2 Answers2

1

Saxon allows you to define your own collation in its configuration file, you basically have to set up a configuration file with a section like

 <collations>
      <collation uri="http://example.com/uga-trans"
      rules="&lt; ʾa &lt; b &lt; g &lt; ḫ &lt; d &lt; h &lt; w &lt; z &lt; ḥ &lt; ṭ &lt; y &lt; k &lt; š &lt; l &lt; m &lt; ḏ &lt; n &lt; ẓ &lt; s &lt; ʿ &lt; p &lt; ṣ &lt; q &lt; r &lt; ṯ &lt; ġ &lt; t &lt; ʾi &lt; ʾu &lt; s2"/>
 </collations>

where the uri attribute defines a URI as the name for your collation that you can then use in the collation attribute of an xsl:sort:

            <xsl:perform-sort select="$input-seq">
                <xsl:sort select="string()" collation="http://example.com/uga-trans"/>
            </xsl:perform-sort> 

The syntax to be used in the rules attribute is the one defined for the Java class RuleBasedCollator https://docs.oracle.com/javase/7/docs/api/java/text/RuleBasedCollator.html, it has an example there for Norwegian. The only caveat is that the Java syntax is plain text while the Saxon configuration is XML so the < to define the ordering has to be escaped in the rules attribute as &lt;.

I have set up above a rule based on the transcription sequence presented in the Wikipedia article https://en.wikipedia.org/wiki/Ugaritic_alphabet. Whether that is the one you are looking for I am not sure.

You can run Saxon from the command line with -config:yourconfiguationfile.xml to use such a configuration, oXygen has a field in the Saxon specific transformation scenario dialog to select a configuration file.

Martin Honnen
  • 160,499
  • 6
  • 90
  • 110
  • Thanks so much @Martin! Just a few additional information for neophyte (as I am): especially for those who work with Oxygen 19.1, the configuration file is called 'Saxon configuration.xml'. Default setting `Edition=EE`. Since I'm using Saxon PE, I change to `Edition=PE`. Afterwards, in the transformation editor scenario, I have added the configuration file which can be added from the window "Transformer" (above "Parameters"). And then in the xslt file: ` ` – Vanessa Jan 25 '18 at 14:24
  • 1
    Thanks Martin for this answer. I would just add a couple of observations, mainly for anyone else coming here later. Saxon actually gives you two ways to select an Ugaritic collation: on `xsl:sort` you can specify either `lang='uga'`, or `collation='http://www.w3.org/2013/collation/UCA?lang=uga'. Currently the first selects the JDK Locale for Ugaritic while the second selects the ICU-J version. I don't know if they're different. However, neither of course copes with transliterated input. – Michael Kay Jan 26 '18 at 00:12
-1

Im not sure if this will be the best solution, but thats the one I know.

The code you are searching for is:

      <xsl:sort select="((orth='character1') * 1) + ((orth='character2') * 2) + ((orth='character3') * 3) ..." data-type="text" order="ascending"/>

You need to do this for every character of the alphabet. The lower the multiplication, the earlier it appears in the result. Basically you are defining your own order for specified values.

Christian Mosz
  • 543
  • 2
  • 12
  • Thanks @Christian. I tried but I have the following message "Arithmetic operator is not defined for arguments of types (xs:boolean, xs:integer)." So I tried to define `param`: `` and replace ` by `` but it doesn't work either. – Vanessa Jan 24 '18 at 17:16
  • 1
    This is not the right approach. I will come back to produce an answer to the question myself, but the right approach is either to find the right collation in the ICU library that does the job, or to implement a new collation that does it. It may be necessary to use the collation argument of xsl:sort instead of the lang argument. – Michael Kay Jan 25 '18 at 08:26
  • Michael is probably right. As I wrote, im pretty sure there is a better solution. This worked for me for the task of getting an own ordering for an alphabet. – Christian Mosz Jan 25 '18 at 13:09