Extract text between each two occurrences of a string into a new file

Question

I have an input file that looks like this:

DATA-GROUP A

text 1

text 2

text 3

DATA-GROUP B

text 4

text 5

text 6
etc.

How can I extract each occurrence of the string "DATA-GROUP" and the lines beneath it until (but not including) the next occurrence to a new file? I would like to do this for all the occurrences so that I have multiple new files. For example, the first file would be:

DATA-GROUP A

text 1

text 2

text 3

The next would have DATA-GROUP B and so on. I tried the following:

numsets=($(grep -c "DATA-GROUP " input.txt))
for ((i=1;i<numsets+1;i++)); do
        awk '/DATA-GROUP /&&++k=='"$i"',/DATA-GROUP /' input.txt > output"$i".txt   
        wait    
    done

but it didn't work.

xhienne · Answer 1 · 2020-07-20T09:24:29.247

1

You can do everything with a single awk command:

awk '
    /^DATA-GROUP/ {
        close(file)
        file = "output_" $2 ".txt"
    }

    { print > file }
' input.txt

Each part is put into a file "output_N.txt" where N is the data-group identifier.

Here is how it works in details:

The first section of this program (/^DATA-GROUP/ ...) is only executed when a DATA-GROUP line is met. It defines the file name where all the lines of that data-group will go. It also closes the file that was used during the previous data-group part.
The second section is executed unconditionally for each line and just prints the line into the file that was defined the last time a DATA-GROUP line was met.

edited Jul 20 '20 at 09:24

answered Jul 02 '20 at 23:27

xhienne

5,738
1
15
34

Thanks for your help. Can you please explain the logic of this line? I am getting an error that states: awk: can't open '': No such file or directory. I am trying to figure out what the issue is. – user13758913 Jul 06 '20 at 18:48
@user13758913 I have added some details, albeit a bit too late I guess. The program was tested and worked at the time I answered, so just copy-paste it. The data file that is used is input.txt, the same as in your question. – xhienne Jul 20 '20 at 09:26

Extract text between each two occurrences of a string into a new file

1 Answers1