how to replace the next string after match (every) two blank lines?

Question

is there a way to do this kind of substitution in Awk, sed, ...?

I have a text file with sections divived into two blank lines;

   section1_name_x
   dklfjsdklfjsldfjsl


   section2_name_x
   dlskfjsdklfjsldkjflkj


   section_name_X
   dfsdjfksdfsdf

I would to replace every "section_name_x" by "#section_name_x", this is, how to replace the next string after match (every) two blank lines?

Thanks,

Steve,

score 2 · Answer 1 · answered Dec 28 '13 at 14:22

awk '
    (NR==1 || blank==2) && $1 ~ /^section/ {sub(/section/, "#&")}
    { 
        print
        if (length) 
            blank = 0
        else
            blank ++
    }
' file

   #section1_name_x
   dklfjsdklfjsldfjsl


   #section2_name_x
   dlskfjsdklfjsldkjflkj


   #section_name_X
   dfsdjfksdfsdf

score 1 · Answer 2 · answered Dec 28 '13 at 13:03

hm....

Given your example data why not just

sed 's/^section[0-9]*_name.*/#/' file > newFile && mv newFile file

some seds support sed -i OR sed -i"" to overwrite the existing file, avoiding the && mv ... shown above.

The reg ex says, section must be at the beginning of the line, and can optionally contain a number or NO number at all.

IHTH

Håkon Hægland · Answer 3 · 2013-12-28T21:12:47.207

1

In gawk you can use the RT builtin variable:

gawk '{$1="#"$1; print $0 RT}' RS='\n\n' file

* Update *

Thanks to @EdMorton I realized that my first version was incorrect. What happens:

Assigning to $1 causes the record to be rebuildt, which is not good in this cases since any sequence of white space is replaced by a single space between fields, and by the null string in the beginning and at the end of the record.
Using print adds an additional newline to the output.

The correct version:

gawk '{printf "%s", "#" $0 RT}' RS='\n\n\n' file

edited Dec 28 '13 at 21:12

answered Dec 28 '13 at 13:17

Håkon Hægland

39,012
21
81
174

1

Just be aware that that will replace all chains of spaces in the section name with a single space. Why not just `'{print "#" $0 RT}'`? Also, the script isn't looking for 2 blank lines, just 1 since the first `\n` is at the end of a non-blank line. You need to use `RS='\n\n\n'` and then either set `ORS=""` or use `printf` instead of `print` for the output. – Ed Morton Dec 28 '13 at 13:47
1

@EdMorton Thanks for the comment. I totally agree with your first comment. Assigning to `$1` will eat spaces around each field, so that was not what I intended to do.. However, I tested your second comment, regarding `RS='\n\n'` or `RS='\n\n\n'` and it is correct that there are three new line characters in the file, but it seems like one of them is eaten by `FS` so `RS='\n\n'` does actually work.. – Håkon Hægland Dec 28 '13 at 14:46
1

No, it just SEEMS to work because it matches the first 2 `\n`s and then leaves the 3rd one as the start of the next record and then you assigning to $1 causes record recompilation to delete the leading spaces. it's a combination of 2 bugs hiding each other. Try it with input that only has 1 blank line between records instead of 2 and you'll see that your script mistakenly thinks just one blank line is an acceptable RS. – Ed Morton Dec 28 '13 at 20:51
1

Yes, that'll do it. I'd have written it as `printf "#%s%s", $0, RT` but either way will work. – Ed Morton Dec 28 '13 at 21:34

how to replace the next string after match (every) two blank lines?

3 Answers3