Case insensitive string matching in awk

Question

Assume a multi-line text file file in which some lines start with whitespaces.

$ cat file
foo Baz
  baz QUX
    QUx Quux
BaZ Qux
BazaaR

Further assume that I wish to convert all those lines that start with a keyword (e.g. "baz") to lowercase letters, irrespective if (a) that keyword is written in lower- or uppercase letters (or any combination thereof) itself, and (b) that keyword is preceeded by whitespaces.

$ cat file | sought_command
foo Baz        # not to lowercase (line does not start with keyword)
  baz qux      # to lowercase
    QUx Quux
baz qux        # to lowercase
BazaaR         # not to lowercase (line does not start with keyword, but merely with a word containing the keyword)

I believe that awk is the tool to do it, but I am uncertain how to implement the case-insensitivity for the keyword matching.

$ cat file | awk '{ if($1 ~ /^ *baz/) print tolower($0); else print $0}'
foo Baz
  baz qux
    QUx Quux
BaZ Qux       # ERROR HERE: was not replaced, b/c keyword not recognized.
BazaaR

EDIT 1: Adding IGNORECASE=1 appears to resolve the case-insensitivity, but now incorrectly converts the last line to lowercase.

$ cat file | awk '{IGNORECASE=1; if($1~/^ *baz/) print tolower($0); else print $0}'
foo Baz
  baz qux
    QUx Quux
baz qux
bazaar       # ERROR HERE: should not be converted to lowercase, as keyword not present (emphasis on word!).

I don't know whether awk does support case-insensitive match (as some other regex dialects do). But this should work: `/^ *[bB][aA][zZ]/`. — Scheff's Cat, Jul 05 '17 at 16:09

score 10 · Accepted Answer · answered Jul 05 '17 at 16:59

10

You already know about tolower() so just use it again in the comparison and test for an exact string match instead of partial regexp:

awk 'tolower($1)=="baz"{$0=tolower($0)}1'

answered Jul 05 '17 at 16:59

Ed Morton

188,023
17
78
185

Sundeep · Answer 2 · 2017-07-06T01:44:18.713

8

Add word-boundary after search string

$ awk '{IGNORECASE=1; if($1~/^ *baz\>/) print tolower($0); else print $0}' ip.txt 
foo Baz
  baz qux
    QUx Quux
baz qux
BazaaR

Can be re-written as:

awk 'BEGIN{IGNORECASE=1} /^ *baz\>/{$0=tolower($0)} 1' ip.txt

Since line anchor is used, no need to match with $1. The 1 at end will print the record, including any changes done

IGNORECASE and \> are gawk specific features. \y can be also used to match word boundary

With GNU sed

$ sed 's/^[[:blank:]]*baz\b.*/\L&/I' ip.txt 
foo Baz
  baz qux
    QUx Quux
baz qux
BazaaR

[[:blank:]] will match space or tab characters
\L& will lowercase the line
\b is word boundary
I flag to match case-insensitively

edited Jul 06 '17 at 01:44

answered Jul 05 '17 at 16:10

Sundeep

23,246
2
28
103

1

I'm impressed (especially about `IGNORECASE`). Regarding your awk sample: `\>` is word boundary? – Scheff's Cat Jul 05 '17 at 16:13
actually OP mentioned about IGNORECASE in edit... and yeah `\>` matches ending position of word – Sundeep Jul 05 '17 at 16:14
His edit appeared just in the same minute like your answer. Thus, I was not sure where to direct my impression to. You won because of the `\>`... – Scheff's Cat Jul 05 '17 at 16:28
@Sundeep Thanks for the detailed answer. As a follow-up regarding your **awk** solution: How can I enforce the conversion to lowercase letters given the explained keyword matching if (and only if) the line is terminated by a semi-colon. For example: `BaZ Qux;` is converted to `baz qux;`, whereas `BaZ Qux` is not. – Michael Gruenstaeudl Jul 05 '17 at 16:49
@MichaelGruenstaeudl you can combine conditions logically... so `/^ *baz\>/ && /;$/`.. – Sundeep Jul 06 '17 at 01:37

Case insensitive string matching in awk

2 Answers2