2

Can someone please help me with this?

I'm trying to match roman numerals with a "." at the end and then a space and a capital letter after the point. For example:

I. And here is a line.

II. And here is another line.

X. Here is again another line.

So, the regex should match the "I. A", "II. A" and "X. H".

I did this "^(XC|XL|L?X{0,3})(IX|IV|V?I{0,3}){1,4}\.\s[A-Z]" But the problem is that this RegEx is also matching with ". A" and i don't want it.

In resume it should have at least one roman numeral, followed by a "." and then a space and a capital letter.

Santiago
  • 25
  • 3

1 Answers1

1

You need a (?=[LXVI]) lookahead at the start that would require at least one Roman number letter at the start of the string:

^(?=[LXVI])(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\.\s[A-Z]
# ^^^^^^^^^

See the regex demo. Not sure why you used {1,4}, I suggest removing it.

Another workaround here would be to use a word boundary right after ^:

^\b(XC|XL|L?X{0,3})(IX|IV|V?I{0,3})\.\s[A-Z]
#^^

This would disallow a match where . appears at the start since \b, required at the same position as the start of string, requires that the next char must be a word char (and here, it must be a Roman number).

Regarding \.\s[A-Z], you may enhance it you add + or * after \s, and if you ever need to match it and exclude from a match, turn it into a positive lookahead, (?=\.\s+[A-Z]) or (?=\.\s*[A-Z]).

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563