1

is there a way to do this kind of substitution in Awk, sed, ...?

I have a text file with sections divived into two blank lines;

   section1_name_x
   dklfjsdklfjsldfjsl


   section2_name_x
   dlskfjsdklfjsldkjflkj


   section_name_X
   dfsdjfksdfsdf

I would to replace every "section_name_x" by "#section_name_x", this is, how to replace the next string after match (every) two blank lines?

Thanks,

Steve,

shellter
  • 36,525
  • 7
  • 83
  • 90
Steve
  • 11
  • 1

3 Answers3

2
awk '
    (NR==1 || blank==2) && $1 ~ /^section/ {sub(/section/, "#&")}
    { 
        print
        if (length) 
            blank = 0
        else
            blank ++
    }
' file
   #section1_name_x
   dklfjsdklfjsldfjsl


   #section2_name_x
   dlskfjsdklfjsldkjflkj


   #section_name_X
   dfsdjfksdfsdf
glenn jackman
  • 238,783
  • 38
  • 220
  • 352
1

hm....

Given your example data why not just

sed 's/^section[0-9]*_name.*/#/' file > newFile && mv newFile file

some seds support sed -i OR sed -i"" to overwrite the existing file, avoiding the && mv ... shown above.

The reg ex says, section must be at the beginning of the line, and can optionally contain a number or NO number at all.

IHTH

shellter
  • 36,525
  • 7
  • 83
  • 90
1

In gawk you can use the RT builtin variable:

gawk '{$1="#"$1; print $0 RT}' RS='\n\n' file

* Update *

Thanks to @EdMorton I realized that my first version was incorrect. What happens:

  • Assigning to $1 causes the record to be rebuildt, which is not good in this cases since any sequence of white space is replaced by a single space between fields, and by the null string in the beginning and at the end of the record.
  • Using print adds an additional newline to the output.

The correct version:

gawk '{printf "%s", "#" $0 RT}' RS='\n\n\n' file
Håkon Hægland
  • 39,012
  • 21
  • 81
  • 174
  • 1
    Just be aware that that will replace all chains of spaces in the section name with a single space. Why not just `'{print "#" $0 RT}'`? Also, the script isn't looking for 2 blank lines, just 1 since the first `\n` is at the end of a non-blank line. You need to use `RS='\n\n\n'` and then either set `ORS=""` or use `printf` instead of `print` for the output. – Ed Morton Dec 28 '13 at 13:47
  • 1
    @EdMorton Thanks for the comment. I totally agree with your first comment. Assigning to `$1` will eat spaces around each field, so that was not what I intended to do.. However, I tested your second comment, regarding `RS='\n\n'` or `RS='\n\n\n'` and it is correct that there are three new line characters in the file, but it seems like one of them is eaten by `FS` so `RS='\n\n'` does actually work.. – Håkon Hægland Dec 28 '13 at 14:46
  • 1
    No, it just SEEMS to work because it matches the first 2 `\n`s and then leaves the 3rd one as the start of the next record and then you assigning to $1 causes record recompilation to delete the leading spaces. it's a combination of 2 bugs hiding each other. Try it with input that only has 1 blank line between records instead of 2 and you'll see that your script mistakenly thinks just one blank line is an acceptable RS. – Ed Morton Dec 28 '13 at 20:51
  • 1
    Yes, that'll do it. I'd have written it as `printf "#%s%s", $0, RT` but either way will work. – Ed Morton Dec 28 '13 at 21:34