Replace multiple occurrences between two strings

Question

I need to replace every character a between xx and zz with hello:

#input
a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz
#output
a xxhellob hellobzz ca xxbczz aaa axxhellozza xxczzaxxczz

This works for one pair, it doesn't work for more xx/zz pairs (it replaces every a between the first xx and last zz):

sed -r ':rep; s/(xx.*)a(.*zz)/\1hello\2/; trep'

I assume the best approach is to use more advanced regex, such as perl.

I am looking for a solution in bash, sed, awk or perl. Is this task even possible with basic/extended regex? Solutions that will not become hard to digest when the pairs have more characters (for example xxxxxx/zzzzzz) are preferred.

mkHun · Answer 1 · 2018-01-05T11:46:20.697

You can try this Perl method

perl -E '$_="a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz";
s{xx(.+?)zz}{"xx".$1=~s/a/hello/gr."zz"}xge; 
say $_ ; '

Explanation

s{
   xx(.+?)zz #grouping the content
 }
 {
   "xx".$1=~s/a/hello/gr."zz" #again making the substitution for $1 and concatenating `xx` and `zz`  
 }xge;

Flags

g -> global

r -> non destructive modifier

e -> eval.

with look arounds

perl -E '$_="a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz";
s{(?<=xx)(.+?)(?=zz)}{$1=~s/a/hello/gr}xge; 
say $_ ; '

Hynek -Pichi- Vychodil · Accepted Answer · 2018-01-05T09:56:19.033

3

Yes, it's best to use Perl

perl -pe's/xx(.+?)zz/"xx".$1=~s|a|hello|gr."zz"/ge' file.txt

edited Jan 05 '18 at 09:56

answered Jan 05 '18 at 09:51

Hynek -Pichi- Vychodil

26,174
5
52
73

I had a hard time choosing of all the great solutions here. This one makes the most sense to me with my limited `perl` knowledge. Thanks! – PesaThe Jan 05 '18 at 14:58
Also, I would love to hear the reason of someone's downvote here. – PesaThe Jan 05 '18 at 22:14
1

@PesaThe I'm not downvoted. Hope, there is no explanation in the answer or the answer is same as [this](https://stackoverflow.com/questions/48105521/replace-multiple-occurrences-between-two-strings/48108606#48108606) answer. – mkHun Jan 08 '18 at 10:22

score 2 · Answer 3 · answered Jan 05 '18 at 01:25

This might work for you (GNU sed):

sed -r ':a;s/zz/\n/;:b;tb;s/(xx[^\na]*)a([^\n]*\n)/\1hello\2/;tb;/zz/ba;s/\n/zz/g' file

This replaces zz with newline and then replaces any a's between xx and a newline with hello.

N.B. It is possible to have any number of xx that are not paired with zz and any a's between them will be substituted.

zdim · Answer 4 · 2018-01-06T03:19:39.683

There may be an award for a regex-only solution, but here is a straightforward one.

Split the string by xx. Iterate over terms and replace a in each term's part up to zz.

I replace a to - for easy reviewing. The begin and stop patterns are in $pb and $pe.

perl -wE'$_ = q(a xxab abzz ca xxbczz aaa axxazza); say; 
    $pb = qr(xx); $pe = qr(zz); 
    ($r, @t) = split /($pb)/; 
    for (@t) { 
        if (/^$pb$/) { $r.=$_, next }; 
        /(.*?)($pe.*)/; 
        if ($m = $1) { $m =~ s/a/-/g; $r .= $m} 
        $r .= $2 if $2 
    }; say $r
'

This is in a form that is ready to test but it should be a script. It prints

a xxab abzz ca xxbczz aaa axxazza
a xx-b -bzz ca xxbczz aaa axx-zza

I've tested with a few more strings but by all means please test more.

This can also be done with a regex but that is much more advanced and harder to understand.

score 0 · Answer 5 · answered Jan 05 '18 at 00:35

0

Your problem is with the .* as . will match every character including white space. You should use \S instead as it will match all non-white space characters:

$ echo 'a xxababzz ca xxbczz aaa axxazza' | sed -r ':rep; s/(xx\S*?)a(\S*?zz)/\1hello\2/; trep'
a xxhellobhellobzz ca xxbczz aaa axxhellozza

answered Jan 05 '18 at 00:35

DjLegolas

76
1
4

I didn't provide good enough input, sorry. There can be any character between `xx/zz`, including whitespace. – PesaThe Jan 05 '18 at 00:42
Also, this won't work properly if there are more pairs in one "word": `'b xxczzaxxczz b'`. – PesaThe Jan 05 '18 at 00:54
Can you explain it? What all the options & `:rep;` & `trep` & the search/replace 1 & 2 are & how they interact? – Xen2050 Jan 05 '18 at 09:05
@Xen2050 `\1` references the first captured group `( regex )`. `:rep` is a label, `t rep` is a command that jumps to label `rep` **ONLY** if any of the `sed` commands changed the patter space. The space is not mandatory: `trep`. – PesaThe Jan 12 '18 at 00:56

Casimir et Hippolyte · Answer 6 · 2018-01-05T01:29:35.913

0

You have to describe all that isn't zz (a character that isn't a z or a z followed by an other character) before and after the a until the zz and to use a label and a conditional test to process the line until there is no more a between xx and zz :

sed -E ':a;s/(xx([^z]|z[^z])*z?)a(([^z]|z[^z])*zz)/\1hello\3/g;ta' file

A Perl way:

perl -pe's/(?:\G(?!^)|xx(?=.*zz))[^za]*(?:z(?!z)[^za]*)*\Ka/hello/g' file

that can be easily changed to:

perl -pe's/(?:\G(?!^)|xxxxxx(?=.*zzzzzz))[^za]*(?:z(?!zzzzz)[^za]*)*\Ka/hello/g' file

to deal with xxxxxx and zzzzzz

edited Jan 05 '18 at 01:29

answered Jan 05 '18 at 00:45

Casimir et Hippolyte

88,009
5
94
125

This works great. However, is there a solution that won't get extra ugly when the pairs have more characters, for example: `xxxxxx/zzzzzz`? – PesaThe Jan 05 '18 at 00:49
It's obviously possible to write a replacement pattern in the same way even if this one is long with sed. If you want something shorter, use Perl. – Casimir et Hippolyte Jan 05 '18 at 00:53
Feel free to downvote my answer, I like that since I don't make a reputation competition. – Casimir et Hippolyte Jan 05 '18 at 01:03
It wasn't me, if you are implying that :) – PesaThe Jan 05 '18 at 01:03
@PesaThe: Don't worry of that, I know. – Casimir et Hippolyte Jan 05 '18 at 01:05
This does not appear to work if the input is 'xxzazz' – potong Jan 05 '18 at 01:29

Replace multiple occurrences between two strings

6 Answers6

Linked

Related