1

In a text file I want to comment out these lines:

<whatever>xyz
<whatever>xyz <whatever>

... that's a certain string followed by either end-of-line or whitespace.

But I want to leave these lines alone:

<whatever>xyz<something><whatever>

... that's the string followed by a character that is not whitespace.

Where the following are of course not literal strings:

  • < whatever > zero or more characters that may be white-space.
  • < something > anything except white-space.

I've tried this:

sed -e '/xyz[ $]/s/^/# /g' in.txt > out.txt

... but it doesn't match the lines with end-of-line immediately after the string. Seems the $ sign is taken as a literal when it is inside square brackets.


This is my current hack:

sed -e '/xyz /s/^/# /g' in.txt > out.txt
sed -e '/xyz$/s/^/# /g' -i out.txt

... but I'd much rather only parse the file once due to speed. I'd also like to match \t as well as ordinary space character; but that is not compulsory.

For this input file, "in.txt":

xyz
xyz #
xyz.

I'm running Linux Mint, i.e. gnu sed.

jcxz100
  • 61
  • 7
  • Your input and the hack doesn't work as you are claiming here. Put some snippet from your actual input. – iamauser Feb 07 '18 at 15:29
  • iamauser: I don't understand? I've just copied back the code I call my hack, and it does what I expected. Btw, I just added the input test file to the original post. – jcxz100 Feb 07 '18 at 20:10

2 Answers2

2

Special characters lose their meaning in bracket expressions.

Try this:

sed -Ee '/(xyz$)|(xyz )|(xyz\t)/s/^/# /g'

> gsed -Ee '/(xyz$)|(xyz )|(xyz)\t/s/^/# /g' in.txt
# xyz
# xyz #
xyz.
Mat Ford
  • 73
  • 1
  • 8
  • 1
    So what? If xyz is followed by whitespace then the question asked that it be commented out. – Mat Ford Feb 07 '18 at 17:35
  • 1
    `(xyz$)|(xyz )|(xyz\t)` = `xyz($| | \t)` = `xyz($|[ \t])` = `xyz($|[[:blank:]])` – Ed Morton Feb 07 '18 at 17:45
  • @EdMorton Why it's not ok with [[:space:]] ? – ctac_ Feb 07 '18 at 17:56
  • @ctac_ It is OK but its not in general the same since `[[:space:]]` includes newline, etc. I was just pointing out other constructs equivalent to what's in the current answer but more concise. – Ed Morton Feb 07 '18 at 18:05
  • @EdMorton I try few minutes with [[:space:]] with no avail. I realize now that $ is not \n. thanks – ctac_ Feb 07 '18 at 18:12
  • Right. `$` is an RE metacharacter representing `end of string` when used in a regexp while `\n` is an escape sequence representing the linefeed character in [some versions of] some tools in regexps and/or print statements. sed and grep are line-oriented and so `$` is sometimes mis-understood as meaning `end of line` just because the end of the input string occurs at the end of the line in those tools, but it's not that in general and even if it was that'd still be different from `\n`. – Ed Morton Feb 07 '18 at 18:16
  • 1
    Thanks :) Also thank you @EdMorton for the shortened versions. – jcxz100 Feb 07 '18 at 23:01
0
$ cat r.sh 
awk '{
   a = $0 ~ /xyz/
   b = $0 ~ /xyz[^ \t]/
   if (a && !b) print "# " $0
   else         print $0
 }' "$@"

Usage

sh r.sh file
slitvinov
  • 5,693
  • 20
  • 31