0

I'm trying to come up with a clean Xpath 1.0 expression to return all nodes in between specific nodes. The requirements are:

  • All nodes after the very last numeric node;
  • but before the very first node that starts with a opening paranthesis.

To test what I'm doing you could use:

<t>
    <s>ABC</s>
    <s>123</s>
    <s>DEF</s>
    <s>456</s>
    <s>GHI</s>
    <s>JKL</s>
    <s>(M)</s>
    <s>NOP</s>
    <s>(Q)</s>
</t>

To only end up with both 'GHI' and 'JKL' I came up with:

//s[position()<count(//s[starts-with(., '(')][1]/preceding::*)+1][position()>count(//s[.*0=0][last()]/preceding::*)+1]

It works just fine, however I can't help but feeling this can be done much smoother. To do so I though there should be a way to use the retrieved nodes in prior Xpath, e.g:

  • To get all nodes after the last numeric node: [.*0=0][last()]/following::s.
  • To get the 1st nodes that starts with a opening paranthesis: [starts-with(., '(')][1].
  • But I fail to combine the statements into a valid syntax as I think something along the lines of: //s[.*0=0][last()]/following::s[position()<[starts-with(.,'(')][1]] which obviously is incorrect.

Any ideas, or am I stuck with what I got in the first place? I'm using this in the Excel function FILTERXML().

JvdV
  • 70,606
  • 8
  • 39
  • 70

1 Answers1

1

It's very difficult to do this with xpath 1.0 (though it's very easy with xpath 2.0), but since you are using Excel and FILTERXML(), the following is a possible workaround:

Assuming your xml is in A1, then in B1 user FILTERXML() for the first set:

=FILTERXML(A1,"//s[translate(., translate(.,'0123456789',''), '')][last()]//following::s")

Credit here goes to the "double translate" method.

And again in the C1:

=FILTERXML(A1,"//s[starts-with(., '(')][1]/preceding::s")

Then it's back to Excel. Either:

=FILTER(B:B,COUNTIF(C:C,B:B))

or

=IF(ISERROR(MATCH(B:B,C:C,0)),"",B:B)

This should output 'GHI' and 'JKL'.

Per OP's comment below, there is also a way to do that with one xpath 1.0 expression using the Kayessian method, which in this case would take the form of:

//s[translate(., translate(.,'0123456789',''), '')][last()]//following::s[count(. | //s[starts-with(., '(')][1]/preceding::s) = count(//s[starts-with(., '(')][1]/preceding::s)]

and, strangely enough FILTERXML() understands it!

Jack Fleeting
  • 24,385
  • 6
  • 23
  • 45
  • Thanks for this, but I'm more interested in a single xpath expression that will return the required array of nodes in a smoother way than the long roundabout route I took. Nonetheless, cheers for the input! – JvdV Nov 19 '21 at 12:40
  • Got you, see edit (though I'm not sure it necessarily qualifies as "smoother"...) – Jack Fleeting Nov 19 '21 at 13:08