0

I have some XML that I'm importing into InDesign. The issue arises with multi-level bulleted lists, which do not format correctly in the document.

The problem is easier to visualise in InDesign. The two screenshots show the same imported XML in different views. The highlighted red areas are the parts I want to remove with XSLT (caused by unwanted unicode 
).

InDesign story editor view

InDesign layout view

Here is the imported XML:

<?xml version="1.0" encoding="UTF-8"?>
<STORY StoryCode="454789" DatePublished="18/06/2019 07:50">
   <Story_text>
      <!--?xml version="1.0" encoding="UTF-8" standalone="yes"?-->
      <h2>List 1</h2>
      <ul>
         <li>
            level 1
            <ul>
               <li>level 2</li>
               <li>level 2</li>
            </ul>
         </li>
         <li>level 1</li>
         <li>
            level 1
            <ul>
               <li>
                  level 2
                  <ul>
                     <li>level 3</li>
                  </ul>
               </li>
            </ul>
         </li>
         <li>level 1</li>
         <li>level 1</li>

      </ul>
   </Story_text>
</STORY>

And here is my current XSLT

<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="xml" indent="no" />

    <!-- #1 copy entire template -->
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>

    <!-- #2 assigns each level of bullet to li1, li2, li3 etc. -->
    <xsl:template match="ul/li">
        <xsl:element name="li{count(ancestor::li) + 1}">
            <xsl:apply-templates select="@*|node()"/>
        </xsl:element>
    </xsl:template>

    <!-- #3 insert paragraph separator after ul elements with li parent -->
    <xsl:template match="li/ul">
        <xsl:copy><xsl:text>&#8233;</xsl:text><xsl:apply-templates/></xsl:copy>
    </xsl:template>
</xsl:stylesheet>

Template #2 is necessary so that each bullet level can be mapped to a specific InDesign paragraph style (e.g. to bullet 1 style, to bullet 2 style etc.). Shoutout to user Tim C for his help here.

Template #3 is necessary to fix a quirk where the the first bullet from a level 2 or above list does not appear on a new line. Because InDesign only allows 1 paragraph style to be applied per new line, without this template, bullets of different levels appear in the same line and get mapped to the same paragraph style (image of InDesign result without template #3).

I have tried the following XSLT:

<xsl:template match="ul/li/ul/text()">
    <xsl:value-of select="translate(., '&#8233;', '')" />
    <xsl:apply-templates/>
</xsl:template>

[This gets close but ultimately fails if there are several consecutive bullets of the same level (see InDesign story editor image). I guess because all paragraph separators contained within the <ul> ... </ul> get stripped rather than just the ones I want removed.]

And

<xsl:template match="ul/li/ul/text()">
   <xsl:value-of select="substring(., 1, string-length(.)-X)" />
</xsl:template>

[Where X ≤2 nothing changes, when X>2 the result is the same as the above method]

I think what I want to achieve is if a <ul> ... </ul> (with at least one parent <ul>) contains ≥2 &#8233;, delete the final &#8233;, but I can't figure out how to translate this into XSLT.

I'd massively appreciate any help/pointers in the right direction.

EDIT

I realise that the imported XML example above is somewhat misleading in terms of line breaks, so here is a more accurate depiction (including the &#8233; paragraph separator):

<?xml version="1.0" encoding="utf-8" standalone="yes"?><STORY StoryCode="454789" DatePublished="18/06/2019 07:50"><Headline>Bullet XML test SO</Headline>&#8233;
<Standfirst><!--?xml version="1.0" encoding="UTF-8" standalone="yes"?--><p>placeholder</p></Standfirst>&#8233;
<Story_text><!--?xml version="1.0" encoding="UTF-8" standalone="yes"?--><h2>List 1</h2>&#8233;
<ul><li>level 1<ul><li>level 2</li>&#8233;
<li>level 2</li>&#8233;
</ul></li>&#8233;
<li>level 1</li>&#8233;
<li>level 1<ul><li>level 2<ul><li>level 3</li>&#8233;
</ul></li>&#8233;
</ul></li>&#8233;
<li>level 1</li>&#8233;
<li>level 1</li>&#8233;
</ul></Story_text>&#8233;
</STORY>

Here is the XML after being transformed with my current XSLT, I've marked the paragraph separators I want to remove.

<?xml version="1.0" encoding="utf-8" standalone="yes"?><STORY StoryCode="454789" DatePublished="18/06/2019 07:50"><Headline>Bullet XML test SO</Headline>&#8233;
<Standfirst><!--?xml version="1.0" encoding="UTF-8" standalone="yes"?--><p>placeholder</p></Standfirst>&#8233;
<Story_text><!--?xml version="1.0" encoding="UTF-8" standalone="yes"?--><h2>List 1</h2>&#8233;
<ul><li1>level 1<ul>&#8233;
<li2>level 2</li2>&#8233;
<li2>level 2</li2>&#8233;
</ul></li1>&#8233;    [TO DELETE]
<li1>level 1</li1>&#8233;
<li1>level 1<ul>&#8233;
<li2>level 2<ul>&#8233;
<li3>level 3</li3>&#8233;
</ul></li2>&#8233;    [TO DELETE]
</ul></li1>&#8233;    [TO DELETE]
<li1>level 1</li1>&#8233;
<li1>level 1</li1>&#8233;
</ul></Story_text>&#8233;
</STORY>
  • Will the final `
` always be the last character in a text node, or can it be followed by other characters? – Tim C Jun 18 '19 at 15:07
  • How would an XML file look like that `contains ≥2 
`? – wp78de Jun 18 '19 at 19:31
  • `last()` and regex `replace` might be helpful, e.g. something along this lines: `replace((//ul/text())[last()], '[
]+', '
')` – wp78de Jun 18 '19 at 19:54
  • I believe Indesign only supports XSLT 1.0, which would prevent `replace` being used as that is XSLT 2.0. – Tim C Jun 19 '19 at 08:52
  • @TimC @wp78de I've added an edit my original post which will hopefully add a bit of clarity. I believe `
` will always be the last character in a `
      ` text node. Am I correct in thinking that the text content of the second `
        ` is `


` despite having child elements interspersed?)
    – lightworks Jun 21 '19 at 08:08
  • The problem you have is the `
` are in separate text nodes, and not single one. So, you need to target the last `text()` node. I won't put this as answer just yet, because I am not 100% sure of your logic, but does this template match do it.... `` (See it in action at http://xsltfiddle.liberty-development.net/jyRYYj2). – Tim C Jun 21 '19 at 08:26
  • This works perfectly! I've tested it on some overly-complex bulleted lists and they format correctly every time. Thanks so much! – lightworks Jun 21 '19 at 09:25

0 Answers0