4

I'm finally learning regexps and training with ack. I believe this uses Perl regexp.

I want to match all lines where the first non-blank characters are if (<word> !, with any number of spaces in between the elements.

This is what I came up with:

^[ \t]*if *\(\w+ *!

It only nearly worked. ^[ \t]* is wrong, since it matches one or none [space or tab]. What I want is to match anything that may contain only space or tab (or nothing).

For example these should not match:

// if (asdf != 0)
else if (asdf != 1)

How can I modify my regexp for that?


EDIT adding command line

ack -i --group -a '^\s*if *\(\w+ *!' c:/work/proj/proj 

Note the single quotes, I'm not so sure about them anymore.

My search base is a larger code base. It does include matching expressions (quite some), but even for example:

274:                }else if (y != 0) 

, which I get as a result of the above command.


EDIT adding the result of mobrule's test

Mobrule, thanks for providing me a text to test on. I'll copy here what I get on my prompt:

C:\Temp\regex>more ack.test
# ack.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
    if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
        if (asdf != 0) # multiple tab - ok
    if (asdf != 0) # spaces + tab ok
     if (asdf != 0) # tab + space ok
     if (asdf != 0) # space + tab + space ok
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok

C:\Temp\regex>ack '^[ \t]*if *\(\w+ *!' ack.test

C:\Temp\regex>"C:\Program\git\bin\perl.exe" C:\bat\ack.pl '[ \t]*if *\(\w+ *!' a
ck.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
    if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
        if (asdf != 0) # multiple tab - ok
    if (asdf != 0) # spaces + tab ok
     if (asdf != 0) # tab + space ok
     if (asdf != 0) # space + tab + space ok
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok

The problem is in my call to my ack.bat!

ack.bat contains:

"C:\Program\git\bin\perl.exe" C:\bat\ack.pl %*

Although I call with a caret, it gets away at the call of the bat file!

Escaping the caret with ^^ does not work.

Quoting the regex with " " instead of ' ' works. My problem was a DOS/win problem, sorry for bothering you all for that.

brian d foy
  • 129,424
  • 31
  • 207
  • 592
Gauthier
  • 40,309
  • 11
  • 63
  • 97
  • As mobrule said in his answer, your regex looks okay for the desired behavior you've described. Can you edit your question to include sample data, the ack command line your using, and your desired output? – Michael Carman Apr 14 '10 at 16:58
  • If you haven't already, read perlrequick (man perlrequick, perldoc perlrequick, or find it online). After that, if you want some more advanced capabilities, read perlre. In any case, I highly recommend Mastering Regular Expressions from O'Reilly. – kbenson Apr 14 '10 at 17:06

3 Answers3

6
^\s*if\s*\(\S+\s*!
  • Use \S for non-white-space. \w will not match any special chars, so if ($word will not match. May be that's OK with your specs, in which case \w (alphanumeric plus "_" ) is OK
$ perl5.8 -e '{$s="else if (asdf \!= 1)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'
NO MATCH
$ perl5.8 -e '{$s="// if (asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'
NO MATCH
$ perl5.8 -e '{$s=" if (asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'  
|asdf|
$ perl5.8 -e '{$s="if (asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}' 
|asdf|
$ perl5.8 -e '{$s="if (\$asdf \!= 0)"; if ($s =~ /^\s*if\s*\((\S+)\s*\!/) { print "|$1|\n";} else { print "NO MATCH\n";}}'
|$asdf|
DVK
  • 126,886
  • 32
  • 213
  • 327
  • Please ignore the backslash in front of "!" in my examples - that's for the benefit of Unix shell. Internally in Perl it'd just be "!" – DVK Apr 14 '10 at 15:39
  • And yes, I had ack.bat somewhere, and it seems to remove the `^` character! (see my edited post). – Gauthier Apr 15 '10 at 07:47
4

In both ack and grep, * matches zero or more, not zero or one. So I think you already have the right solution. What test cases aren't giving you the results you want?

# ack.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
    if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
        if (asdf != 0) # multiple tab - ok
    if (asdf != 0) # spaces + tab ok
     if (asdf != 0) # tab + space ok
     if (asdf != 0) # space + tab + space ok
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok

Results:

$ ack '^[ \t]*if *\(\w+ *!' ack.test
if (asdf != 0)    # no spaces - ok
 if (asdf != 0)   # single space - ok
        if (asdf != 0) # single tab - ok
   if (asdf != 0) # multiple space - ok
                if (asdf != 0) # multiple tab - ok
        if (asdf != 0) # spaces + tab ok
         if (asdf != 0) # tab + space ok
         if (asdf != 0) # space + tab + space ok

$ ack -v '^[ \t]*if *\(\w+ *!' ack.test
// if (asdf != 0)  # not ok
} else if (asdf != 0) # not ok
mob
  • 117,087
  • 18
  • 149
  • 283
  • @mobrule - both ack home page and the OP question stated that ack uses Perl regex, not grep variety. – DVK Apr 14 '10 at 15:57
  • `}else if (y != 0) {` is in my results. Note that I use single quote around the expression in the command line, could it be related to my problem? – Gauthier Apr 15 '10 at 07:25
1

You can try:

(?:\t*| *)if *\(\w+ *!

.

\t*| *

will be zero or more tabs or zero or more spaces not a mix of spaces and tabs.

codaddict
  • 445,704
  • 82
  • 492
  • 529