How to grep and remove from a file all lines between a separator

Question

I have a file that looks like this:

===SEPARATOR===
line2
line3
===SEPARATOR===
line5
line6
===SEPARATOR===
line8
...
lineX
===SEPARATOR===

How can I do a while loop and go through the file, dump anything between two ===SEPARATOR=== occurrences into another file for further processing? I want to add only line2, line3 to the second file on the first iteration. I will parse the file; and on the next iteration I want line5 line6 in second file to do the same parsing again but on different data.

What would the output file look like for your example input? — John Kugelman, Nov 03 '16 at 22:29
I still don't understand what you're trying to do. You have a second file, and in every iteration, you want to replace its contents with the lines between the next pair of separators? What exactly is an iteration? User triggered? Can you add your expected output at different stages? — Benjamin W., Nov 04 '16 at 05:14

score 1 · Answer 1 · answered Nov 03 '16 at 22:28

1

You can exclude all lines matching ===SEPARATOR=== with grep -v and redirect the rest to a file:

grep -vx '===SEPARATOR===' file > file_processed

-x makes sure that only lines completely matching ===SEPARATOR=== are excluded.

answered Nov 03 '16 at 22:28

Benjamin W.

46,058
19
106
116

Sorry, I should have mentioned that I want the second file with only the content from between one pair of separators. line2,line3 I will parse that then I want to put the next batch, line5 line6, than parse that and so on.Thank you. – Nick Constantine Nov 04 '16 at 05:02

score 1 · Answer 2 · answered Nov 03 '16 at 22:38

1

This uses sed to find lines between separators, and then grep -v to delete the separators.

$ sed -n '/===SEPARATOR===/,/===SEPARATOR===/ p' file | grep -v '===SEPARATOR==='
line2
line3
line8
...
lineX

There's got to be a more elegant answer that doesn't repeat the separator three times, but I'm drawing a blank.

answered Nov 03 '16 at 22:38

John Kugelman

349,597
67
533
578

Sorry I should have been more clear, with my question. I just updated. – Nick Constantine Nov 04 '16 at 05:02

score 1 · Answer 3 · answered Nov 03 '16 at 23:05

I am assuming that you do not need the line5 and line6 . You can do it with awk like this:.

awk '$0 == "===SEPARATOR===" {interested = ! interested; next} interested {print}'

Credit goes to https://www.gnu.org/software/gawk/manual/html_node/Boolean-Ops.html#Boolean-Ops

Output:

[root@hostname ~]# cat /tmp/1 | awk '$0 == "===SEPARATOR===" {interested = ! interested; next} interested {print}' /tmp/1
line2
line3
line8
...
lineX

score 1 · Answer 4 · answered Nov 03 '16 at 23:48

awk to the rescue!

with multi-char support (e.g. gawk)

$ awk -v RS='\n?===SEPARATOR===\n' '!(NR%2)' file

line2
line3
line8
...
lineX

or without that

$ awk '/===SEPARATOR===/{p=!p;next} p' file

line2
line3
line8
...
lineX

which is practically the same with @Jay Rajput's answer.

score 1 · Accepted Answer · answered Nov 04 '16 at 00:34

It sounds like you want to save each block of lines to a separate file.

The following solutions create output files f1, f2, containing the (non-empty) blocks of lines betwen the ===SEPARATOR=== lines.

With GNU Awk or Mawk:

awk -v fnamePrefix='f' -v RS='(^|\n)===SEPARATOR===(\n|$)' \
  'NF { fname = fnamePrefix (++n); print > fname; close(fname) }' file

Pure bash - which will be slow:

#!/usr/bin/env bash

fnamePrefix='f'; i=0
while IFS= read -r line; do
  [[ $line == '===SEPARATOR===' ]] && { (( ++i )); > "${fnamePrefix}${i}"; continue; }
  printf '%s\n' "$line" >> "${fnamePrefix}${i}"
done < file

Just tried the bash version and it worked perfectly. Thank you. It didn't need to be to a separate file; but that works just as well. I wanted it in the same file parse the file and then go back to the next block between two separator in the initial file. Anyway, that one I could figure out by myself. Thanks again. — Nick Constantine, Nov 04 '16 at 15:21

How to grep and remove from a file all lines between a separator

5 Answers5