0

Using a positive lookbehind, the individual regexes match their respective strings. When combined, they don't. When one is changed by removing a positive lookbehind it matches. I don't understand why and would like to know so that I can fix it because I don't want the match unconsumed.

value = /  # match a digit or something wrapped in quotes
        (?<value>
          \d+         # Match a digit
            |         # or
          (?:         # Match something wrapped in double quotes
            (?<=")  # <- this is the point of contention
            [^"]+     # captures anything not a "   
            (?=")
          )
            |         # or
          (?:         # Match something wrapped in single quotes
            (?<=')  # <- or this one
            [^']+     # captures anything not a '
            (?=')
          )
        )
      /x

value.match %q!'filename.rb'!
# => #<MatchData "filename.rb" value:"filename.rb">
value.match %q!"filename.rb"!
# => #<MatchData "filename.rb" value:"filename.rb">
value.match %q!66!
# => #<MatchData "66" value:"66">

So it matches any digit or anything wrapped in matching quotes.

long_option = /
        (?<long_option>
          (?<!
            (?:\-\-no\-)   # don't match --no-long
              |
            (?:\-\-\-)     # don't match ---long
          )
          (?<=\-\-)        # it must begin with --
          (?<key_name>
            [a-zA-Z]\w+    # capture the key name
          )
          \b
          (?!
            \-             # make sure it's not part of a longer key
          )
        )
      /x

long_option.match "--long"
# => #<MatchData "long" long_option:"long" key_name:"long">
long_option.match "---long"
# => nil
long_option.match "--long-"
# => nil
long_option.match "--no-long"
# => nil

This also matches nicely.

Now combined, the problems begin:

/#{long_option} #{value}/.match "--long 'filename.rb'"
# => nil

but if value is redefined without the positive lookbehind for the first quote, it matches:

value2 = /
        (?<value>
          \d+
            |
          (?:
            "       # <- no positive lookbehind
            [^"]+
            (?=")
          )
            |
          (?:
            '       # <- for single quote too
            [^']+
            (?=')
          )
        )
      /x

/#{long_option} #{value2}/.match "--long 'filename.rb'"
# => #<MatchData
 "long 'filename.rb"
 long_option:"long"
 key_name:"long"
 value:"'filename.rb">

I've tried combining the long_option with simpler matches and it works, so I'm inclined to think that it is not an obvious source of the problem, hence my question, e.g.

/#{long_option} abc/.match "--long abc"
# => #<MatchData "long abc" long_option:"long" key_name:"long">

I've gone through the match in my head, and trial and error with slightly different patterns and I'm really stuck. Any help or insight would be much appreciated.

Ruby version is 1.9.

ian
  • 12,003
  • 9
  • 51
  • 107

1 Answers1

0

/#{long_option} #{value}/ will not match a string in double or single quotes as you intended because positive lookbehind appears at the beginning of value, whose match succeeds when the character preceding it is a double or single quote. But in /#{long_option} #{value}/, the character right before value is a space, and since the space character cannot be a double or single quote at the same time, that alternative match never succeeds.

sawa
  • 165,429
  • 45
  • 277
  • 381
  • Thanks for taking a look. If so, then `value.match " 'filename.rb'"` should fail, and it doesn't `# => #` – ian Apr 14 '13 at 15:54
  • It should not fail. When `value` is used by itself, there is no problem with lookbehind. – sawa Apr 14 '13 at 16:00
  • Ok, I see now. If there is something preceding that need to match then it will fail _before_ it gets the chance to try the backmatch, so `/abc#{value}/.match "abc'filename.rb'"` will fail. I think I need to add in more possible backmatches and I'll see what happens. – ian Apr 14 '13 at 16:01
  • I think you are confusing the character `"` that takes up one character length and lookbehind `(?<=")` that takes up zero character length. `/ "/` means a space followed by a double quote. A match to `/ (?<=")/` would be "a sequence such that the character at index 0 is a space and the character counting back from index 1 (which is the character at index 0) is double quote", which means "a sequence such that the character at index 0 is a space and is a double quotation. – sawa Apr 14 '13 at 16:06
  • I understand that it's a zero length assertion, but as I understand it the regex engine will move forward a character at a time and not use an index count. The lookbehind will occur once it has moved forwards and attempted a match with what follows the lookbehind assertion (as decribed [here](http://www.regular-expressions.info/lookaround.html). It may amount to the same result but it gives a different explanation, and may allow me to find a way around this. – ian Apr 14 '13 at 16:28
  • I used index only to illustrate it to you. It is not meant to be expressing how a regex works internally. – sawa Apr 14 '13 at 16:30
  • Either way, I'm grateful. I've also found a pattern that works and I can test `/#{long_option} (?:"|')?#{value}/.match "--long 'filename.rb'"` `# => #`. – ian Apr 14 '13 at 16:31
  • That would work, but I wonder whether you need lookbehind in the first place. You could just have `"[^"]+"`, or if you wanted to capture only what is inside, then you could do `"([^"]+)"`. – sawa Apr 14 '13 at 16:39