0

I have a feeling this will get closed as a duplicate as it seems like this would be a common ask...but in my defense, I have searched SO as well as Google and could not find anything.

I'm trying to search SQL files using ripgrep, but I want to exclude matches that are part of a single line SQL comment.

  • I do not need to worry about multi-line /* foobar */ comments. Only single line -- foobar
  • I do not need to capture anything
  • I don't need to worry about the possiblity of the string being part of text, like SELECT '-- foobar' or SELECT '--', foobar. I'm okay with those false exclusions.

Match examples:

  • Match: SELECT foobar
  • Match: , foobar
  • Exclude: SELECT baz -- foobar
  • Exclude: -- foobar
  • Exclude: ---foobar
  • Exclude: -- blah foobar blah

AKA, search for foobar but ignore the match if -- occurs at any point before it on that line.

I have tried things like negative lookbehinds and other methods, but I can't seem to get them to work.

This answer seemed like it would get me there: https://stackoverflow.com/a/13873003/3474677

That answer says to use (?<!grad\()vec to find matches of vec that are not prefaced with grad(. So if I translate that to my use case, I would get (?<!--)foobar. This works...but only for excluding lines that contain --foobar it does not handle the other scenarios.

Worst case scenario, I can just pipe the results from ripgrep into another filter and exclude lines that match --.*foobar but I'm hoping to find an all-in-one solution.

Chad Baldwin
  • 2,239
  • 18
  • 32
  • 2
    I think ripgrep [does not support](https://docs.rs/regex/1.5.4/regex/#syntax) lookarounds, or else you might use it like this `^(?:(?!--).)*\bfoobar\b` https://regex101.com/r/XLjEFM/1 – The fourth bird Nov 24 '21 at 22:30
  • @Thefourthbird that did it!! I just had to enable `--pcre2` mode. Post that up as an answer and I'll mark it, thanks! If you're able to explain why it works, it would be appreciated, but if not, I'm more than happy to step through it on regex101 and figure it out. – Chad Baldwin Nov 25 '21 at 00:13
  • @Thefourthbird Took me a while, but I think I get it now...it's capturing zero or more characters that are not followed by `--`, _then_ it walks back to look for `foobar`. That's super clever, thank you! – Chad Baldwin Nov 25 '21 at 01:13

1 Answers1

1

According to the comments, using ripgrep and enable --pcre2 you can use:

^(?:(?!--).)*\bfoobar\b
  • ^ Start of string
  • (?: Non capture group
    • (?!--). Negative lookahead, assert the from the current position there is not -- directly to the right. If that assertion is true, then match any character except a newline
  • )* Close the non capture group and optionally repeat it
  • \bfoobar\b Match the word foobar

See a regex demo.

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • It's worth noting that making the quantifier lazy cut the number of steps down in half (on average) and still got me the same results. Thanks for the tip @zikato – Chad Baldwin Nov 25 '21 at 18:09
  • @ChadBaldwin For the example data that is the case, but that can differ is het content to match is at the end of the string. – The fourth bird Nov 25 '21 at 18:16