3

I want to match roman numbers using Groovy regular expressions (I have not tried this in Java but should be the same). I found an answer in this website in which someone suggested the following regex:

/M{0,4}(CM|CD|D?C{0,3})(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})/

The problem is that a expression like /V?I{0,3}/ is not greedy in Groovy. So for a string like "Book number VII" the matcher /V?I{0,3}/ returns "V" and not "VII" as it would be desired.

Obviously if we use the pattern /VI+/ then we DO get the match "VII"... but this solution is not valid if the string is something like "Book number V" as we will get no matches...

I tried to force the maximum character catching by using a greedy quantifier /VI{0,3}+/ or even /VI*+/ but I still get the match "V" over "VII"

Any ideas?

kapa
  • 77,694
  • 21
  • 158
  • 175
Alex
  • 33
  • 3
  • 1
    It's certainly greedy in Java (running against "VIII" gives "VIII" in group 3, not "V". Also testing just the regex `"V?I{0,3}"` also gives greedy results). Are you sure you're observing this behaviour in Groovy? Seems kind of surprising that Groovy would use a different regex engine. – Mark Peters Oct 18 '10 at 14:24
  • Huh? So, `?` *is* greedy, but `{n,m}` is not? – Bart Kiers Oct 18 '10 at 14:26
  • Not exactly on topic, perhaps, but the regex seems over-specified. It will pass invalid Roman numerals so it can't be used for error checking. Yet it's a PITA apparently. How about something like `[MCDXLIV]+` instead? – Tony Ennis Oct 18 '10 at 14:32
  • I just run this in Java and the results are correct. So Java returns greedy results. But Groovy does not. I am running a script using the Groovy console for groovy 1.7.0 and {0,3} is NOT greedy. The strange thing is that {n,m} IS greedy, but ONLY when n!=0. Likewise .* is greedy only when a match with 0 characters does not meet the pattern. – Alex Oct 18 '10 at 15:52

2 Answers2

0

Why not just (IX|IV|V?I{1,3}|V) ?

Chochos
  • 5,155
  • 22
  • 27
  • Thanks. That works since V?I{1,3} gets a higher priority than just V. I'm just confused since V?I{0,3} should work ok by itself (and it does in Java but not in Groovy) – Alex Oct 18 '10 at 17:24
  • So.. how about picking this answer as "the one"? – Chochos Oct 19 '10 at 14:41
0

I found what my mistake was. Thing is that patterns like /V?I{0,3}/ or /V?I*/ are met even by EMPTY strings... so for a string like "Book VII" the matcher will throw the following result matches:

Result[0] --> ''
Result[1] --> '' 
Result[2] --> ''
Result[3] --> '' 
Result[4] --> '' 
Result[5] --> 'VII'
Result[6] --> '' 

The greedy result is there (Result[5]) alright. My problem was that I was always picking the first match (Result[0]) and that is only valid if the pattern is not met by empty strings.

For instance, the suggested pattern /V?I{1,3}|V/ will throw only one result, so picking the first result match is Ok:

Result[0] --> 'VII'

... This is so since the pattern is not met by empty strings.

Hope this helps others

Alex
  • 33
  • 3