string not analyzed correctly in MarkLogic

Question

My goal is to analyze a postal code and to identify the separate parts using a regular expression and the analyze-string function.

I use MarkLogic 10. Using the regex to match validates the example below correctly. However, when I use it to analyze the string it fails to identify the various groups correctly:

(: analyze dutch postal code :)
let $regex := "^[1-9]\d{3}([A-Z]{2}(\d+(\S+)?)?)?$"
return fn:analyze-string("1234AA11bis", $regex)

it returns the following :

<s:analyze-string-result xmlns:s="http://www.w3.org/2005/xpath-functions">
<s:match>1234<s:group nr="1">AA<s:group nr="2">1<s:group nr="3">1bis</s:group></s:group></s:group>
</s:match>
</s:analyze-string-result>

I expect it to return '11' as the value of group nr 2 and 'bis' as the result of group nr 3.

I used some online regex analyzers that return the correct result. Am I missing some flag or something or is this just a bug in MarkLogic?

What are you trying to achieve? Could you describe the goal of your regex in plain english? Often that helps to spot issues. — Christian Baumann, Sep 30 '20 at 13:22
I would say it is a bug, the `AA` is correctly collected as matching `[A-Z]{2}` and the following `\d+` should collect both digits `11` and not only the first one. — Martin Honnen, Sep 30 '20 at 13:54

score 0 · Accepted Answer · answered Sep 30 '20 at 13:56

0

I am not sure what the specs have to say about nested greedy patterns, but there is an easy fix:

let $regex := "^[1-9]\d{3}([A-Z]{2}(\d+([^\d\s]+)?)?)?$"
return fn:analyze-string("1234AA11bis", $regex)

HTH!

answered Sep 30 '20 at 13:56

grtjn

20,254
1
24
35

Thanks for the fix, that helps. The regex mentioned is an official one provided by the government. Still wondering why the output of ML differs from other engines. – Marcel de Kleine Sep 30 '20 at 14:11
Indeed, it's always best to make the regex unambiguous, rather than relying on greediness or non-greediness. – Michael Kay Sep 30 '20 at 14:11

string not analyzed correctly in MarkLogic

1 Answers1