5

I need to replace every character a between xx and zz with hello:

#input
a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz
#output
a xxhellob hellobzz ca xxbczz aaa axxhellozza xxczzaxxczz

This works for one pair, it doesn't work for more xx/zz pairs (it replaces every a between the first xx and last zz):

sed -r ':rep; s/(xx.*)a(.*zz)/\1hello\2/; trep'

I assume the best approach is to use more advanced regex, such as perl.

I am looking for a solution in bash, sed, awk or perl. Is this task even possible with basic/extended regex? Solutions that will not become hard to digest when the pairs have more characters (for example xxxxxx/zzzzzz) are preferred.

PesaThe
  • 7,259
  • 1
  • 19
  • 43

6 Answers6

3

You can try this Perl method

perl -E '$_="a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz";
s{xx(.+?)zz}{"xx".$1=~s/a/hello/gr."zz"}xge; 
say $_ ; '

Explanation

s{
   xx(.+?)zz #grouping the content
 }
 {
   "xx".$1=~s/a/hello/gr."zz" #again making the substitution for $1 and concatenating `xx` and `zz`  
 }xge;

Flags

g -> global

r -> non destructive modifier

e -> eval.

with look arounds

perl -E '$_="a xxab abzz ca xxbczz aaa axxazza xxczzaxxczz";
s{(?<=xx)(.+?)(?=zz)}{$1=~s/a/hello/gr}xge; 
say $_ ; '
mkHun
  • 5,891
  • 8
  • 38
  • 85
3

Yes, it's best to use Perl

perl -pe's/xx(.+?)zz/"xx".$1=~s|a|hello|gr."zz"/ge' file.txt
Hynek -Pichi- Vychodil
  • 26,174
  • 5
  • 52
  • 73
  • I had a hard time choosing of all the great solutions here. This one makes the most sense to me with my limited `perl` knowledge. Thanks! – PesaThe Jan 05 '18 at 14:58
  • Also, I would love to hear the reason of someone's downvote here. – PesaThe Jan 05 '18 at 22:14
  • 1
    @PesaThe I'm not downvoted. Hope, there is no explanation in the answer or the answer is same as [this](https://stackoverflow.com/questions/48105521/replace-multiple-occurrences-between-two-strings/48108606#48108606) answer. – mkHun Jan 08 '18 at 10:22
2

This might work for you (GNU sed):

sed -r ':a;s/zz/\n/;:b;tb;s/(xx[^\na]*)a([^\n]*\n)/\1hello\2/;tb;/zz/ba;s/\n/zz/g' file

This replaces zz with newline and then replaces any a's between xx and a newline with hello.

N.B. It is possible to have any number of xx that are not paired with zz and any a's between them will be substituted.

potong
  • 55,640
  • 6
  • 51
  • 83
1

There may be an award for a regex-only solution, but here is a straightforward one.

Split the string by xx. Iterate over terms and replace a in each term's part up to zz.

I replace a to - for easy reviewing. The begin and stop patterns are in $pb and $pe.

perl -wE'$_ = q(a xxab abzz ca xxbczz aaa axxazza); say; 
    $pb = qr(xx); $pe = qr(zz); 
    ($r, @t) = split /($pb)/; 
    for (@t) { 
        if (/^$pb$/) { $r.=$_, next }; 
        /(.*?)($pe.*)/; 
        if ($m = $1) { $m =~ s/a/-/g; $r .= $m} 
        $r .= $2 if $2 
    }; say $r
'

This is in a form that is ready to test but it should be a script. It prints

a xxab abzz ca xxbczz aaa axxazza
a xx-b -bzz ca xxbczz aaa axx-zza

I've tested with a few more strings but by all means please test more.

This can also be done with a regex but that is much more advanced and harder to understand.

zdim
  • 64,580
  • 5
  • 52
  • 81
0

Your problem is with the .* as . will match every character including white space. You should use \S instead as it will match all non-white space characters:

$ echo 'a xxababzz ca xxbczz aaa axxazza' | sed -r ':rep; s/(xx\S*?)a(\S*?zz)/\1hello\2/; trep'
a xxhellobhellobzz ca xxbczz aaa axxhellozza
DjLegolas
  • 76
  • 1
  • 4
  • I didn't provide good enough input, sorry. There can be any character between `xx/zz`, including whitespace. – PesaThe Jan 05 '18 at 00:42
  • Also, this won't work properly if there are more pairs in one "word": `'b xxczzaxxczz b'`. – PesaThe Jan 05 '18 at 00:54
  • Can you explain it? What all the options & `:rep;` & `trep` & the search/replace 1 & 2 are & how they interact? – Xen2050 Jan 05 '18 at 09:05
  • @Xen2050 `\1` references the first captured group `( regex )`. `:rep` is a label, `t rep` is a command that jumps to label `rep` **ONLY** if any of the `sed` commands changed the patter space. The space is not mandatory: `trep`. – PesaThe Jan 12 '18 at 00:56
0

You have to describe all that isn't zz (a character that isn't a z or a z followed by an other character) before and after the a until the zz and to use a label and a conditional test to process the line until there is no more a between xx and zz :

sed -E ':a;s/(xx([^z]|z[^z])*z?)a(([^z]|z[^z])*zz)/\1hello\3/g;ta' file

A Perl way:

perl -pe's/(?:\G(?!^)|xx(?=.*zz))[^za]*(?:z(?!z)[^za]*)*\Ka/hello/g' file

that can be easily changed to:

perl -pe's/(?:\G(?!^)|xxxxxx(?=.*zzzzzz))[^za]*(?:z(?!zzzzz)[^za]*)*\Ka/hello/g' file

to deal with xxxxxx and zzzzzz

Casimir et Hippolyte
  • 88,009
  • 5
  • 94
  • 125