Count the number of replacements of GNU awk gensub

Question

I have the following bash code using gawk gsub:

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" '{ num += gsub( OLD, NEW ); print $0 > FILE_TMP; } END { print num }' "$FILE")

It replaces all instances of OLD with NEW and outputs the results to FILE_TMP - The number of replacements is caught in the bash variable.

Is it possible to achieve the same results using gawk gensub?

$FILE is 182 lines long.
There are 8 occurrences of $OLD that are to be replaced with $NEW

I've tried several ways, most results equal 182 as I guess I counting the number of occurrences of $0.

The closest I have got is this:

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" '{ num[$0=gensub( OLD, NEW, "G" )]++; print $0 > FILE_TMP; } END { for (i in num) print num[i] }' "$FILE")

Which does output to FILE_TMP correctly. However replacedCount is:

replacedCount='8
1
1
1
1
1
1
8
1
8
8
1
1
1
8
1
1
1
1
1
1
1
1
8
8
1
1
1
8
1
1
8
1
1
1
1
1
1
1
1
8
8
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
8
1
8
1
1
1
8
1
1
8
8
1'

Given that `gensub` doesn't return that information I don't see how you could without counting matches yourself in some other way. — Etan Reisner, Oct 28 '15 at 15:57
I don't understand why you want `gsub()` behaviour in `gensub()` if it was working fine with the former : ) — fedorqui, Oct 28 '15 at 16:17
tell us why you have to use `gensub()` what else you want to achieve? It smells like a X,Y problem — Kent, Oct 28 '15 at 16:19

Michael Back · Accepted Answer · 2015-11-17T00:58:46.627

The following matches on $OLD as a gate to performing gensub and increment of "num" counter:

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" 'BEGIN { num=0 }; $0 ~ OLD { $0=gensub(OLD,NEW,"G"); num++ }; { print > FILE_TMP }; END { print num }' "$FILE")

If a count for each match is wanted (multiple within line) we would need to lose the "G" flag in gensub() and put the increment and gensub() within a while loop.

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" 'BEGIN { num=0 }; { while ($0 ~ OLD) { $0=gensub(OLD,NEW,1); num++ } }; { print > FILE_TMP }; END { print num }' "$FILE")

The gensub() is primarily for simplifying the problem of replacing an "Nth" match or for not touching the original. In this problem, it seems perfectly reasonable and natural to modify $0, and it is more succinct and clear to use gsub() and sub() as in the following:

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" 'BEGIN { num=0 }; $0 ~ OLD { gsub(OLD,NEW); num++ }; { print > FILE_TMP }; END { print num }' "$FILE")

... or the "increment for each match" version...

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" 'BEGIN { num=0 }; { while ($0 ~ OLD) { sub(OLD,NEW); num++ } }; { print > FILE_TMP }; END { print num }' "$FILE")

Please don't post code-only answers. Add an explanation please. — Jonathan Lam, Nov 16 '15 at 00:06
Thank you very much! (Sorry for the delay, better late than never!) — Turtle, Dec 30 '15 at 15:31

Count the number of replacements of GNU awk gensub

1 Answers1