Sed substitution with negative lookahead and negative lookbehind regex

Question

I have to replace all the numbers in the ranges 35-79 and 235-249 with a text, let's say "hello world".

After a few minutes I came up with this expression:

sed -r "s/((3[5-9]|[4-7][0-9])|(23[5-9]|24[0-9]))/[\1 hello world]/g" file1.txt > file2.txt

The problem I had is that it also identified parts of larger numbers as valid numbers. For example, in the number 256, 56 was detected as a valid input, and that's not what I wanted. These numbers are preceded and followed by white spaces or random alphanumeric characters, so using the word boundary would not be an option. I managed to solve that using negative lookbehind and negative lookahead, obtaining this result:

sed -r "s/(((?<![0-9])3[5-9](?![0-9])|(?<![0-9])[4-7][0-9](?![0-9]))|(23[5-9]|24[0-9]))/[\1 hello world]/g" file1.txt > file2.txt

Unfortunately, sed doesn't recognize lookbehind and lookahead. I know Perl does, but I'm forced to do this using only sed. Any idea about how to solve this in sed?

https://stackoverflow.com/questions/26110266/does-lookbehind-work-in-sed — Wiktor Stribiżew, Oct 15 '20 at 12:40

score 1 · Accepted Answer · answered Oct 15 '20 at 12:38

With perl

$ echo '40-840-236;59a' | perl -pe 's/(?<!\d)(23[5-9]|24\d|3[5-9]|[4-7]\d)(?!\d)/[$1 hello world]/g'
[40 hello world]-840-[236 hello world];[59 hello world]a

With sed (syntax checked with GNU sed, will differ for other implementations)

$ echo '40-840-236;59a' | sed -E ':a s/(^|[^0-9])(3[5-9]|[4-7][0-9]|23[5-9]|24[0-9])([^0-9\n]|$)/\1[\2\n hello world]\3/; ta; s/\n//g'
[40 hello world]-840-[236 hello world];[59 hello world]a

:a label a
(^|[^0-9]) match start of line or non-digit character
(3[5-9]|[4-7][0-9]|23[5-9]|24[0-9]) valid numbers to be matched
([^0-9\n]|$) match non-digit, non-newline characters or end of line
\1[\2\n hello world]\3 all the capture groups, required output format, plus an additional newline character to avoid loop running forever
ta branch to label a as long as the substitution succeeds
s/\n//g remove the extra newlines

Sed substitution with negative lookahead and negative lookbehind regex

1 Answers1