2

I have the following bash code using gawk gsub:

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" '{ num += gsub( OLD, NEW ); print $0 > FILE_TMP; } END { print num }' "$FILE")

It replaces all instances of OLD with NEW and outputs the results to FILE_TMP - The number of replacements is caught in the bash variable.

Is it possible to achieve the same results using gawk gensub?

  1. $FILE is 182 lines long.
  2. There are 8 occurrences of $OLD that are to be replaced with $NEW

I've tried several ways, most results equal 182 as I guess I counting the number of occurrences of $0.

The closest I have got is this:

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" '{ num[$0=gensub( OLD, NEW, "G" )]++; print $0 > FILE_TMP; } END { for (i in num) print num[i] }' "$FILE")

Which does output to FILE_TMP correctly. However replacedCount is:

replacedCount='8
1
1
1
1
1
1
8
1
8
8
1
1
1
8
1
1
1
1
1
1
1
1
8
8
1
1
1
8
1
1
8
1
1
1
1
1
1
1
1
8
8
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
8
1
8
1
1
1
8
1
1
8
8
1'
Turtle
  • 23
  • 4
  • 1
    Given that `gensub` doesn't return that information I don't see how you could without counting matches yourself in some other way. – Etan Reisner Oct 28 '15 at 15:57
  • I don't understand why you want `gsub()` behaviour in `gensub()` if it was working fine with the former : ) – fedorqui Oct 28 '15 at 16:17
  • 2
    tell us why you have to use `gensub()` what else you want to achieve? It smells like a X,Y problem – Kent Oct 28 '15 at 16:19

1 Answers1

0

The following matches on $OLD as a gate to performing gensub and increment of "num" counter:

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" 'BEGIN { num=0 }; $0 ~ OLD { $0=gensub(OLD,NEW,"G"); num++ }; { print > FILE_TMP }; END { print num }' "$FILE")

If a count for each match is wanted (multiple within line) we would need to lose the "G" flag in gensub() and put the increment and gensub() within a while loop.

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" 'BEGIN { num=0 }; { while ($0 ~ OLD) { $0=gensub(OLD,NEW,1); num++ } }; { print > FILE_TMP }; END { print num }' "$FILE")

The gensub() is primarily for simplifying the problem of replacing an "Nth" match or for not touching the original. In this problem, it seems perfectly reasonable and natural to modify $0, and it is more succinct and clear to use gsub() and sub() as in the following:

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" 'BEGIN { num=0 }; $0 ~ OLD { gsub(OLD,NEW); num++ }; { print > FILE_TMP }; END { print num }' "$FILE")

... or the "increment for each match" version...

replacedCount=$(gawk -v FILE_TMP="$FILE_TMP" -v OLD="$OLD" -v NEW="$NEW" 'BEGIN { num=0 }; { while ($0 ~ OLD) { sub(OLD,NEW); num++ } }; { print > FILE_TMP }; END { print num }' "$FILE")
Michael Back
  • 1,821
  • 1
  • 16
  • 17