3

I want to find all of the lines in lsof with "Google" in them, so I tried the following:

lsof |  awk '/.*google.*/ { print $1 "," $2 "," $3} ' > new_file.csv

which yields correctly an output with rows starting with the word "google".

But, then I try this and the csv contains nothing:

lsof |  awk '/\s*google.*/ { print $1 "," $2 "," $3} ' > new_file.csv

But, I thought that the \s* means any number of spaces. Is there any reason for this behavior? Thank you.

makansij
  • 9,303
  • 37
  • 105
  • 183

1 Answers1

6

\s does mean spaces and \s* does mean zero-or-more spaces but not in awk.

awk uses a different (older) regex engine.

For awk you want [[:space:]]* to match zero-or-more spaces. (That's a character class class of [:space:] in a character list [].)

That being said if you just care about google being in the output then you just need /google/.

If you want an word-anchored google then you want /\<google\>/.

As Ed Morton points out GNU Awk version 4.0+ added support for the \s metacharacter as well.

Etan Reisner
  • 77,877
  • 8
  • 106
  • 148
  • `\s` does mean any space char in GNU awk. – Ed Morton Jun 29 '15 at 22:45
  • I tried `lsof | gawk '/^\s*oogle/ { print $1 ", " $2 }'` but it still returns nothing. I looked at the man pages for `gawk` and it says that the `^` means "beginning of string". – makansij Jun 30 '15 at 18:46
  • What version of gawk? Do you actually care about the leading spaces? Does the line with "google" in it really start with spaces? `/^\s*oogle/` will *not* match a line with "google" in it. It will match a line that starts with zero-or-more spaces and then contains "oogle" immediately after that. – Etan Reisner Jun 30 '15 at 18:49
  • gawk versioin 4.1.3. The leading spaces are because I want this to run on both mac temrinal and cygwn. Cygwin sometimes puts weird spaces before lines of output (for example, in netstat they put spaces before the output). Plus, I'd like to understand why it doesn't work, for long term benefit. – makansij Jun 30 '15 at 18:51
  • If your pattern doesn't look for spaces then it doesn't care about spaces. The pattern `/google/` will match *any* line that contains "google" in it *anywhere* spaces or not. Don't over-specify your match. If all you want is a line that says "google" on it *somewhere* use `/google/` or `/\/` if you want "google" as a word by itself. Just as I explained in the answer itself. – Etan Reisner Jun 30 '15 at 18:53