I'm trying to come up with a clean Xpath 1.0 expression in Excel's FILTERXML()
function to return nodes with the following requirement:
- The node must have a sibling (following) that starts with the exact same three characters of itself.
The point is to find out if there is some similarity to a certain degree in the data. Imagine the following sample data:
<t>
<s>ABCDEF</s>
<s>GHIJKL</s>
<s>MNOPQR</s>
<s>GHISTU</s>
<s>ABVWXY</s>
</t>
From here, I'd like to return GHIJKL
since it's first 3 characters, 'GHI', are found at the start of the second-to-last node.
I've been trying to piece together functions like starts-with()
, substring()
and count()
, yet not been able to get it right. My (obviously wrong) attempt:
//s[count(following::*[starts-with(., substring(<placeholder>, 1,3))]>0]
I'm unsure if it's possible at all and as to what to write instead of the placeholder or how to rework the query to tell the expression I'd like to take the three leftmost characters of each node and test if there are any duplicates in the following ones.