2

I have to replace all the numbers in the ranges 35-79 and 235-249 with a text, let's say "hello world".

After a few minutes I came up with this expression:

sed -r "s/((3[5-9]|[4-7][0-9])|(23[5-9]|24[0-9]))/[\1 hello world]/g" file1.txt > file2.txt

The problem I had is that it also identified parts of larger numbers as valid numbers. For example, in the number 256, 56 was detected as a valid input, and that's not what I wanted. These numbers are preceded and followed by white spaces or random alphanumeric characters, so using the word boundary would not be an option. I managed to solve that using negative lookbehind and negative lookahead, obtaining this result:

sed -r "s/(((?<![0-9])3[5-9](?![0-9])|(?<![0-9])[4-7][0-9](?![0-9]))|(23[5-9]|24[0-9]))/[\1 hello world]/g" file1.txt > file2.txt

Unfortunately, sed doesn't recognize lookbehind and lookahead. I know Perl does, but I'm forced to do this using only sed. Any idea about how to solve this in sed?

Wiktor Stribiżew
  • 607,720
  • 39
  • 448
  • 563
GPF
  • 23
  • 3

1 Answers1

1

With perl

$ echo '40-840-236;59a' | perl -pe 's/(?<!\d)(23[5-9]|24\d|3[5-9]|[4-7]\d)(?!\d)/[$1 hello world]/g'
[40 hello world]-840-[236 hello world];[59 hello world]a

With sed (syntax checked with GNU sed, will differ for other implementations)

$ echo '40-840-236;59a' | sed -E ':a s/(^|[^0-9])(3[5-9]|[4-7][0-9]|23[5-9]|24[0-9])([^0-9\n]|$)/\1[\2\n hello world]\3/; ta; s/\n//g'
[40 hello world]-840-[236 hello world];[59 hello world]a
  • :a label a
  • (^|[^0-9]) match start of line or non-digit character
  • (3[5-9]|[4-7][0-9]|23[5-9]|24[0-9]) valid numbers to be matched
  • ([^0-9\n]|$) match non-digit, non-newline characters or end of line
  • \1[\2\n hello world]\3 all the capture groups, required output format, plus an additional newline character to avoid loop running forever
  • ta branch to label a as long as the substitution succeeds
  • s/\n//g remove the extra newlines
Sundeep
  • 23,246
  • 2
  • 28
  • 103