1

I have an input file that looks like this:

DATA-GROUP A

text 1

text 2

text 3

DATA-GROUP B

text 4

text 5

text 6
etc.

How can I extract each occurrence of the string "DATA-GROUP" and the lines beneath it until (but not including) the next occurrence to a new file? I would like to do this for all the occurrences so that I have multiple new files. For example, the first file would be:

DATA-GROUP A

text 1

text 2

text 3

The next would have DATA-GROUP B and so on. I tried the following:

numsets=($(grep -c "DATA-GROUP " input.txt))
for ((i=1;i<numsets+1;i++)); do
        awk '/DATA-GROUP /&&++k=='"$i"',/DATA-GROUP /' input.txt > output"$i".txt   
        wait    
    done

but it didn't work.

1 Answers1

1

You can do everything with a single awk command:

awk '
    /^DATA-GROUP/ {
        close(file)
        file = "output_" $2 ".txt"
    }

    { print > file }
' input.txt

Each part is put into a file "output_N.txt" where N is the data-group identifier.

Here is how it works in details:

  • The first section of this program (/^DATA-GROUP/ ...) is only executed when a DATA-GROUP line is met. It defines the file name where all the lines of that data-group will go. It also closes the file that was used during the previous data-group part.

  • The second section is executed unconditionally for each line and just prints the line into the file that was defined the last time a DATA-GROUP line was met.

xhienne
  • 5,738
  • 1
  • 15
  • 34
  • Thanks for your help. Can you please explain the logic of this line? I am getting an error that states: awk: can't open '': No such file or directory. I am trying to figure out what the issue is. – user13758913 Jul 06 '20 at 18:48
  • @user13758913 I have added some details, albeit a bit too late I guess. The program was tested and worked at the time I answered, so just copy-paste it. The data file that is used is input.txt, the same as in your question. – xhienne Jul 20 '20 at 09:26