SED replace few first occurences ( and ranges ) of pattern

Question

is this possible to change first 4 ( or more ) occurences of string in this scenario using SED (opposite of sed -r 's/[^[:space:]]*/TEST/4g'):

TEST TEST TEST TEST five six seven

I get it working with reversing words order in line using AWK twice, but this is long, complex and I want to get this with just SED:

echo one two three four five six seven | awk '{for(i=NF;i>=1;i--) printf "%s ", $i;print ""}'  | sed -r 's/[^ ]*/TEST/4g' |  awk '{for(i=NF;i>=1;i--) printf "%s ", $i;print ""}'

Also maybe there is option to change ranges of occurence like 3-5, 6-12, ...?

Example input is:

one two three four five six seven

eight nine ten eleven twelve thirteen fourteen

fifteen sixteen seventeen eighteen nineteen twenty twenty-one

awk is better for this, you'll not understand a cryptic sed command six monts after writing it. — oguz ismail, Jun 17 '19 at 09:05
That answer won't work as here, the searched text is not static. There are other answers there that might fit here, though. — Wiktor Stribiżew, Jun 17 '19 at 09:14
@CorentinLimier I know this option, this will work only for the same word :) — mike, Jun 17 '19 at 09:38
It doesn't answer your question but you can simplify your code using `rev` : `echo one two three four five six seven | rev | sed 's/[^ ]*/TSET/4g' | rev` . I'm trying to find a better one as the sed command must be updated if the line contains a different number of words. — Corentin Limier, Jun 17 '19 at 10:17
Please add sample input and your desired output for that sample input to your question. — Cyrus, Jun 17 '19 at 11:21
It seems to work with GNU `sed`, but you cannot do the ranges: `sed 's/[^ ][^ ]*/\n&/g;:t;/\n/{x;/.\{4\}/!{s/$/./;x;s/\n[^ ][^ ]*/TEST/;bt};x};s/\n//g' <<< "one two three four five six seven"` — Wiktor Stribiżew, Jun 17 '19 at 13:06
A simple way to change the first four strings on a line is to add markers to those strings you want to replace e.g. `sed 's/\S\+/\n&/g;s/\n//5g;s/\n\S\+/TEST/g' file` ranges on a line can be achieved using a similar method. — potong, Jun 17 '19 at 22:44
@potong - worth an answer. I wasn't seeing how you got to ranges that way, but you just have to add the lower limit to the first replacement. Neat. This also gets points for having the string `TEST` there just once. — stevesliva, Jun 18 '19 at 16:01

simlev · Answer 1 · 2019-06-18T06:48:03.510

What about a single AWK:

awk '{for(i=1;i<=NF;i++) if(i<5){$i="TEST"}; print}'

Test run:

$ echo one two three four five six seven | awk '{for(i=1;i<=NF;i++) if(i<5){$i="TEST"}; print}'
TEST TEST TEST TEST five six seven

This solution is short, readable and maintainable. If it does not satisfy you, please add some details about your specific problem.

Perl equivalent solution:

perl -pe 's/\S+/$i++<4?"TEST":$&/ge'

Test run:

$ echo one two three four five six seven | perl -pe 's/\S+/$i++<4?"TEST":$&/ge'
TEST TEST TEST TEST five six seven

maybe there is option to change ranges of occurence like 3-5, 6-12

AWK:

awk '{for(i=3;i<6;i++)$i="TEST";print}'

Test run on the newly provided input file:

$ awk '{for(i=3;i<6;i++)$i="TEST";print}' input
one two TEST TEST TEST six seven
eight nine TEST TEST TEST thirteen fourteen
fifteen sixteen TEST TEST TEST twenty twenty-one

Perl:

perl -pe 's/\S+/++$c~~[3..5]?"TEST":$&/ge'

Test run on the newly provided input file:

$ perl -pe '$c=0;s/\S+/++$c~~[3..5]?"TEST":$&/ge' input
Smartmatch is experimental at -e line 1. <== This is a warning that goes to STDERR
one two TEST TEST TEST six seven
eight nine TEST TEST TEST thirteen fourteen
fifteen sixteen TEST TEST TEST twenty twenty-one

This is ok, but I'm searching for something based on SED if this is even possible and quite easy to implement and remember. — mike, Jun 17 '19 at 09:44
@mike Yes, you made it clear that youì're looking for a simple only sed solution. I was wondering whether it's just for the sake of learning sed (in which case "not possible" could be an answer) or there are some requirements imposed by the problem at hand (in which case providing a little more context could yield better answers). — simlev, Jun 17 '19 at 10:00
@mike with sed anything other than `s/old/new/` will not be `quite easy to implement and remember.` it will instead be a nightmarish collection of runes that will leave you whimpering in your sleep when you come across it in your code 6 months later and need to understand it. — Ed Morton, Jun 17 '19 at 14:35

score 1 · Answer 2 · answered Jun 17 '19 at 13:14

The answer has been provided here by mikeserv. NOTE: if you want to process a range, you need to use the maximum bound, as it will process as many matches as it can without throwing any exceptions/errors.

GNU sed:

echo 'one two three four five six seven' | \
  sed 's/[^[:space:]]*/\n&/g;:t;/\n/{x;/.\{4\}/!{s/$/./;x;s/\n[^[:space:]]*/TEST/;bt};x};s/\n//g'

POSIX sed:

nl='
';
echo 'one two three four five six seven' | sed "s/[^[:space:]]*/\\$nl&/g;:t${nl}/\n/{x;/.\{4\}/!{${nl}s/$/./;x;s/\n[^[:space:]]*/TEST/;bt$nl};x$nl};s/\n//g"

See the online sed demo.

Original explanation (note that here, 1 is replaced with 2, you may use any other patterns):

There I use two notable techniques. In the first place every occurrence of 1 on a line is replaced with \n1. In this way, as I do the recursive replacements next, I can be sure not to replace the occurrence twice if my replacement string contains my replace string. For example, if I replace he with hey it will still work.

I do this like:
s/1/\
&/g
Secondly, I am counting the replacements by adding a character to hold space for each occurrence. Once I reach three no more occur. If you apply this to your data and change the \{3\} to the total replacements you desire and the /\n1/ addresses to whatever you mean to replace, you should replace only as many as you wish.

Yes, I had thought about this but then I reread the OP looking for something that is not *long and complex* and ditched the idea. Having to choose between the two equally unexplained requirements of *simple* and *sed*, I went with the first and dropped the second. This is however a great excercise for learning sed, it that's the goal. — simlev, Jun 17 '19 at 13:35
Wow, this is indeed working but very complex, I thought that there is way to make this case easy to implement and understand. — mike, Jun 18 '19 at 13:30

Ed Morton · Answer 3 · 2019-06-17T14:40:38.217

This is a completely inappropriate task for sed as sed is for doing simple s/old/new/ on individual strings, that is all. With any awk in any shell on every UNIX box:

$ echo one two three four five six seven | awk '{for (i=1; i<=4; i++) $i="TEST"}1'
TEST TEST TEST TEST five six seven

$ echo one two three four five six seven | awk '{for (i=3; i<=5; i++) $i="TEST"}1'
one two TEST TEST TEST six seven

and if you need to parameterize it:

echo one two three four five six seven |
    awk -v beg=3 -v end=5 '{for (i=beg; i<=end; i++) $i="TEST"}1'
one two TEST TEST TEST six seven

stevesliva · Answer 4 · 2019-06-17T15:47:11.190

0

$ echo "one two three four fix six" | \
sed -E ':r s/(^|(TEST )+)[^ ]*/\1TEST/;/^(TEST ){4}/!br'
TEST TEST TEST TEST fix six

Explanation:

:r label named r to branch back to
s/(^|(TEST )+)[^ ]*/\1TEST/; replacement that replaces just one occurrence of a non-TEST word, preceeded by either the start of the line or 1 or more TESTs
/^(TEST ){4}/!br' regex for what's wanted, followed by the !br to branch back to :r if it's not matched yet.

Clearly this is fragile. It will loop infinitely if any lines don't have four words. Might be GNU sed only.

edited Jun 17 '19 at 15:47

answered Jun 17 '19 at 15:07

stevesliva

5,351
1
16
39

What this character "|" does after "^"? – mike Jun 18 '19 at 13:24
1

Within parentheses, a vertical bar is an 'or.' `(alice|bob)` matches either word. The `^|` might look like two logical operators, but it's `^` to match the start of the pattern space, followed by an or. – stevesliva Jun 18 '19 at 15:50

SED replace few first occurences ( and ranges ) of pattern

4 Answers4