0

Starting a new question as my other question solved a different issue with the regex.

Here's my regex:

(?i)\\d{1,4}(?<!v(?:ol)?\\.?\\s?)(?![^\\(]*\\))

Regex split up for clarity:

(?i) - case insensitive

\\d{1,4} - a number with 1-4 digits

(?<!v(?:ol)?\\.?\\s?) the number cannot be preceded by 'v', 'v.', 'vol', 'vol.', with or without a space on the end.

(?![^\\(]*\\)) - Number cannot be inside parentheses.

It all works except for the 'vol.' bit.:

@"Words words 342 words (2342) (words 2 words) (words).ext" result 342 - correct.

@"Words - words words (2010) (words 2 words) (words).ext" result nil - correct.

@"words words v34 35.ext" result 34 - incorrect.

@"Words vol.342 343 (1234) (3 words) (desc).ext" result 342 - incorrect.

What am I doing wrong with my 'vol.' section?

Nick Locking
  • 2,147
  • 2
  • 26
  • 42

1 Answers1

2

You need to put the lookbehind before the number. Also, you need to add digits as illegal characters inside the lookbehind, or the 4 in v.34 will match. Try

(?i)(?<!v(?:ol)?\\.?\\s*\\d*)\\d{1,4}(?![^(]*\\))

This is expecting (edit: wrongly, as it turns out) that regexkitlite supports infinite repetition inside lookbehind which not many regex flavors do.

A look into the docs shows that it does support finite (but variable) repetition inside lookbehind, and if you are aware that the following will only work if there is at most one space between vol. and the number, then you could try

(?i)(?<!v(?:ol)?\\.?\\s?)(?<!\\d)\\d{1,4}(?![^(]*\\))
Tim Pietzcker
  • 328,213
  • 58
  • 503
  • 561
  • Doesn't work, I'm afraid. Also, on the last question I raised, someone said that lookbehinds should come after the number. Now I don't know what to think! – Nick Locking Nov 24 '10 at 18:13
  • Well, a lookbehind assertion looks backwards from the *current position* within the string, so if you put it after the number, it will look at the number and see that it isn't `vol.` or anything like it. This regex is working in RegexBuddy; but it may well be that regexkitlite does not support infinite or even variable repetition inside lookbehind assertions (if it supports them at all, some languages like JavaScript don't know lookbehind...). – Tim Pietzcker Nov 24 '10 at 21:38
  • I just took a look at the [docs](http://regexkit.sourceforge.net/RegexKitLite/#ICUSyntax_ICURegularExpressionSyntax) and according to what I find there, the second regex should work. OH! I'm just realizing that I forgot to escape the backslashes! Will edit. – Tim Pietzcker Nov 24 '10 at 21:56
  • `(?i)(?<!v(?:ol)?\\.?\\s?)(?<!\\d)\\d{1,4}(?![^\\(]*\\))` is the final regex - but it still fails. If I take out the look-ahead, it picks up the non-"vol." stuff properly. If I take out the look-behinds, the not-inside-parentheses bit works. Rage rising! – Nick Locking Nov 24 '10 at 22:46
  • @Nick: Oh darn. I think I've spotted another problem. When I edited my answer yesterday, it was rather late, so I overlooked the unescaped backslash in the lookahead. You don't have to escape the parenthesis inside the character class, though. So now I've (hopefully) finalized the regex by editing my answer yet again. Sorry for the mess. – Tim Pietzcker Nov 25 '10 at 07:14