0

I would like to align multiple translations of a TEI-encoded text and tranform it via xslt into html.

The xml (adapted from https://www.tei-c.org/release/doc/tei-p5-doc/en/html/ref-linkGrp.html) looks like this:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="mini.xsl"?>

<TEI>
<linkGrp type="translation">
 <link target="#CCS1 #SW1"/>
 <link target="#CCS2 #SW2"/>
 <link target="#CCS #SW"/>
</linkGrp>
<div type="volume" xml:id="CCS"
 xml:lang="fr">
 <p>
  <s xml:id="CCS1">Longtemps, je me suis couché de bonne heure.</s>
  <s xml:id="CCS2">Parfois, à peine ma bougie éteinte, mes yeux se fermaient si vite que je n'avais pas le temps de me dire : "Je m'endors."</s>
 </p>
<!-- ... -->
</div>
<div type="volume" xml:id="SW" xml:lang="en">
 <p>
  <s xml:id="SW1">For a long time I used to go to bed early.</s>
  <s xml:id="SW2">Sometimes, when I had put out my candle, my eyes would close so quickly that I had not even time to say "I'm going to sleep."</s>
 </p>
<!-- ... -->
</div>
</TEI>

The linkGrp element contains the alignment info. I would like to select the s-element within the div-elements according according to this alignment info.

With the following xsl file I can output the attribute values themselves, but I have no idea how to grap and output the corresponding lines:

<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:output method="text"/>

  <xsl:template match="/">
    <xsl:for-each select="TEI/linkGrp">
    <xsl:apply-templates select="link"/>
  </xsl:for-each>
  </xsl:template>

<xsl:template match="link">
    <xsl:value-of select="@target"/>
</xsl:template>
</xsl:stylesheet>

What I am trying to get is a simple html-table that has #CCS-lines on one side and the #SW on the other, that is:

<table>
<tr>
<td>Longtemps, je me suis couché de bonne heure.</td>
<td>For a long time I used to go to bed early.</td>
</tr>
<tr>
<td>Parfois, à peine ma bougie éteinte, mes yeux se fermaient si vite que je n'avais pas le temps de me dire : "Je m'endors."</td>
<td>Sometimes, when I had put out my candle, my eyes would close so quickly that I had not even time to say "I'm going to sleep."</td>
</tr>
</table>

Any help will be appreciated!

Alex W.
  • 119
  • 9
  • What would be your desired outcome HTML? – zx485 Mar 05 '20 at 23:34
  • It is not clear to me why there is a simple list of `link` elements that seems to reference elements that are nested. When processing the `link` elements, how do you decide that you only want to process the first two and not the last one with ``? And are you restricted to using XSLT 1? Breaking up your target attribute values is much easier with the string functions like `tokenize` that XSLT/XPath 2 and later have to offer. – Martin Honnen Mar 05 '20 at 23:40
  • The example xml is from the official TEI documentation (cf. link) and I'm not exactly sure why they aligned the divs containing the s-elements. I would simply like to address the s elements with a given id. The steps would be: 1) parse the `` section and store the two attributes of every `link` element. 2) parse the two divs and select the elements by the IDs stored in step 1. As to the XSLT version, I'd prefer, if possible, v. 1.0. The tokenize function would be, of course, a strong argument in favour of XSLT 2.0 – Alex W. Mar 06 '20 at 08:01
  • @MartinHonnen : I had provided a sample html at the end of my question. The desired output is a simple table that aligns one line of the first div with the corresponding line from the second – Alex W. Mar 06 '20 at 08:03
  • Is the list of links complete? That is, does it reference every paragraph that you want to output, or do you have to interpolate paragraphs that aren't linked explicitly? – Michael Kay Mar 06 '20 at 08:20
  • The list is complete, that is I would like to output exclusively the lines referred to in the `` element. In Python, I would simply parse the ``, store the attributes as a list of tuples, then parse the `div` elements in a dictionary with the attrtibutes as keys and finally loop over the list of tuples (the items of which would be identical with the dict keys) and output the corresponding dict values. But I guess this is not how XSLT works. – Alex W. Mar 06 '20 at 08:28

1 Answers1

2

If I understand the required logic correctly (?), you could do something like:

XSLT 1.0

<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" omit-xml-declaration="yes" version="1.0" encoding="utf-8" indent="yes"/>

<xsl:key name="s" match="s" use="concat('#', @xml:id)" />

<xsl:template match="/TEI">
    <table>
        <xsl:for-each select="linkGrp/link">
            <tr>
                <td>
                    <xsl:value-of select="key('s', substring-before(@target, ' '))"/>
                </td>
                <td>
                    <xsl:value-of select="key('s', substring-after(@target, ' '))"/>
                </td>
            </tr>
        </xsl:for-each>
    </table>    
  </xsl:template>

</xsl:stylesheet>

Applied to your input example, this will produce:

Result

<table>
  <tr>
    <td>Longtemps, je me suis couché de bonne heure.</td>
    <td>For a long time I used to go to bed early.</td>
  </tr>
  <tr>
    <td>Parfois, à peine ma bougie éteinte, mes yeux se fermaient si vite que je n'avais pas le temps de me dire : "Je m'endors."</td>
    <td>Sometimes, when I had put out my candle, my eyes would close so quickly that I had not even time to say "I'm going to sleep."</td>
  </tr>
  <tr>
    <td/>
    <td/>
  </tr>
</table>

Note that the last row's cells are empty. I am not sure what the correct result should be.

michael.hor257k
  • 113,275
  • 6
  • 33
  • 51
  • This looks like a very viable solution, thanks! As I have, in my use case, more than two whitespace-separated elements in the `target` arguments, I will have to find some work-around for the `substring-before` and `substring-after` function. – Alex W. Mar 06 '20 at 13:15
  • Which XSLT processor do you plan to use? – michael.hor257k Mar 06 '20 at 15:27
  • I'm quite new to XSLT, so I'm not entirely sure. As especially the suggested solution with the key function seems to be too expensive for a client-side transformation on the fly, I might as well go for the (server-side) transformation into static html via XSLT 2.0. – Alex W. Mar 06 '20 at 15:57
  • 1
    I don't know why you think the use of key is expensive: it's considered the most efficient. If you have multiple values in a string, you need to tokenize it. In XSLT 1.0 this can be done using an extension function, if the processor supports it. Otherwise you need a to use a recursive template. See here how to identify your processor: https://stackoverflow.com/a/25245033/3016153 – michael.hor257k Mar 06 '20 at 16:03
  • Also, if some links can have more ids than others, you will end up with an irregular table, with some rows having more cells than others. Unless you spend extra effort to produce empty cells for the "missing" ids. – michael.hor257k Mar 06 '20 at 16:10
  • As to the last comment, wouldn't the `value-of` node (as shown in the code you've provided in your answer) just result in an empty `` if there is no corresponding key and wouldn't the number of cells be always the same? – Alex W. Mar 06 '20 at 16:23
  • If the 1st link contains say `#CCS1 #SW1` and the 2nd link has `#CCS2 #SW2 #AB2`, then the 1st row will have 2 cells and the 2nd row will have 3. – michael.hor257k Mar 06 '20 at 16:29