-1

I would like to use awk to modify a text file. The modified text file should transform any word starting with "te" or "Te" and doesn't include a number into "yyyyy" - to sort of censor the file.

So for example a file

Hello everyone,
today is a great day to get tested by mr. Tenet here!
Don't te11 anyone!

should be modified into

Hello everyone,
today is a great day to get yyyyy by mr. yyyyy here!
Don't te11 anyone!

Then I'd like to include information about the modification - to say how many lines the file has and how many lines were modified (is it needed to use a for cycle to do this?)

This info should be added to the end of the file and look like this:

The file has 3 lines and 2 out of these were modified.

I am quite lost and would appreciate any help. Thank you

wjandrea
  • 28,235
  • 9
  • 60
  • 81

1 Answers1

1
awk 'BEGIN{ IGNORECASE=1; m1=0; m2=0 }
     { x=gsub(/te[a-zA-Z]* /,"yyyyy ",$0); m1+=(x!=0); m2+=x; print }
     END{ print "The file has " NR " lines and " m1 " out of these were modified, with " m2 " changes"}' inputfile

or

awk 'BEGIN{ IGNORECASE=1; m1=0; m2=0 }
     { x=gsub(/te[[:alhpa:]]* /,"yyyyy ",$0); m1+=(x!=0); m2+=x; print }
     END{ print "The file has " NR " lines and " m1 " out of these were modified, with " m2 " changes"}' inputfile

If you do not need the output of the changed text, then remove print from the second line.

output:

Hello everyone,
today is a great day to get yyyyy by mr. yyyyy here!
Don't te11 anyone!
The file has 3 lines and 1 out of these were modified, with 2 changes

EDIT: because of the comment on Teheran! i changed my input file to:

Hello everyone,
today is a great day, to get tested by mr. Tenet here!
time to light some external fire in Teheran!
Don't te11 anyone!

and the script to:

awk 'BEGIN{ IGNORECASE=1; m1=0; m2=0 }
     { x=gsub(/\<te[[:alpha:]^[0-9][:punct:]]*/,"yyyyy ",$0); m1+=(x!=0); m2+=x; print }
     END{ print "The file has " NR " lines and " m1 " out of these were modified, with " m2 " changes"}' inputfile

this seems to work ok:

Hello everyone,
today is a great day, to get yyyyy  by mr. yyyyy  here!
time to light some external fire in yyyyy
Don't te11 11 anyone!
The file has 4 lines and 3 out of these were modified, with 4 changes
Luuk
  • 12,245
  • 5
  • 22
  • 33
  • Regular expression `/te[[:alpha:]]* /` will also affect words with `te` inside, for example it turn `eternal fire` into `eyyyyy fire`. It will also fail for word followed by punctuation, for example `Where is Tehran?` would result in unaltered text. – Daweo Dec 11 '20 at 10:54
  • I edited my answer to make it work better. Thanks for the input! – Luuk Dec 11 '20 at 17:01
  • But i seen i do have a lot to learn about regular expressions... because `te11` is now also change to `yyyyy`. – Luuk Dec 11 '20 at 17:04
  • changing the regular expression from `/\ – Luuk Dec 11 '20 at 17:11
  • Alright, and if the "te" wouldn't have to be at the start of the word, but also for example at the end? – Milchschnitte Dec 13 '20 at 17:37