10

Assume a multi-line text file file in which some lines start with whitespaces.

$ cat file
foo Baz
  baz QUX
    QUx Quux
BaZ Qux
BazaaR

Further assume that I wish to convert all those lines that start with a keyword (e.g. "baz") to lowercase letters, irrespective if (a) that keyword is written in lower- or uppercase letters (or any combination thereof) itself, and (b) that keyword is preceeded by whitespaces.

$ cat file | sought_command
foo Baz        # not to lowercase (line does not start with keyword)
  baz qux      # to lowercase
    QUx Quux
baz qux        # to lowercase
BazaaR         # not to lowercase (line does not start with keyword, but merely with a word containing the keyword)

I believe that awk is the tool to do it, but I am uncertain how to implement the case-insensitivity for the keyword matching.

$ cat file | awk '{ if($1 ~ /^ *baz/) print tolower($0); else print $0}'
foo Baz
  baz qux
    QUx Quux
BaZ Qux       # ERROR HERE: was not replaced, b/c keyword not recognized.
BazaaR

EDIT 1: Adding IGNORECASE=1 appears to resolve the case-insensitivity, but now incorrectly converts the last line to lowercase.

$ cat file | awk '{IGNORECASE=1; if($1~/^ *baz/) print tolower($0); else print $0}'
foo Baz
  baz qux
    QUx Quux
baz qux
bazaar       # ERROR HERE: should not be converted to lowercase, as keyword not present (emphasis on word!).
codeforester
  • 39,467
  • 16
  • 112
  • 140
Michael Gruenstaeudl
  • 1,609
  • 1
  • 17
  • 31
  • 1
    I don't know whether awk does support case-insensitive match (as some other regex dialects do). But this should work: `/^ *[bB][aA][zZ]/`. – Scheff's Cat Jul 05 '17 at 16:09

2 Answers2

10

You already know about tolower() so just use it again in the comparison and test for an exact string match instead of partial regexp:

awk 'tolower($1)=="baz"{$0=tolower($0)}1'
Ed Morton
  • 188,023
  • 17
  • 78
  • 185
8

Add word-boundary after search string

$ awk '{IGNORECASE=1; if($1~/^ *baz\>/) print tolower($0); else print $0}' ip.txt 
foo Baz
  baz qux
    QUx Quux
baz qux
BazaaR

Can be re-written as:

awk 'BEGIN{IGNORECASE=1} /^ *baz\>/{$0=tolower($0)} 1' ip.txt 

Since line anchor is used, no need to match with $1. The 1 at end will print the record, including any changes done

IGNORECASE and \> are gawk specific features. \y can be also used to match word boundary


With GNU sed

$ sed 's/^[[:blank:]]*baz\b.*/\L&/I' ip.txt 
foo Baz
  baz qux
    QUx Quux
baz qux
BazaaR
  • [[:blank:]] will match space or tab characters
  • \L& will lowercase the line
  • \b is word boundary
  • I flag to match case-insensitively
Sundeep
  • 23,246
  • 2
  • 28
  • 103
  • 1
    I'm impressed (especially about `IGNORECASE`). Regarding your awk sample: `\>` is word boundary? – Scheff's Cat Jul 05 '17 at 16:13
  • actually OP mentioned about IGNORECASE in edit... and yeah `\>` matches ending position of word – Sundeep Jul 05 '17 at 16:14
  • His edit appeared just in the same minute like your answer. Thus, I was not sure where to direct my impression to. You won because of the `\>`... – Scheff's Cat Jul 05 '17 at 16:28
  • @Sundeep Thanks for the detailed answer. As a follow-up regarding your **awk** solution: How can I enforce the conversion to lowercase letters given the explained keyword matching if (and only if) the line is terminated by a semi-colon. For example: `BaZ Qux;` is converted to `baz qux;`, whereas `BaZ Qux` is not. – Michael Gruenstaeudl Jul 05 '17 at 16:49
  • @MichaelGruenstaeudl you can combine conditions logically... so `/^ *baz\>/ && /;$/`.. – Sundeep Jul 06 '17 at 01:37