1

Suppose I have the following file: (Song.txt)

Song one
bla bla bla bla 
bla bla bla bla bla
Song two
yaya ya yaa 
blaaa bla bla blaaaaa
Song three
bla bla bla

I want to separate this file into three files to be like the following:

First filename should be Song_1.txt

Song One
bla bla bla bla 
bla bla bla bla bla

Second filename should be Song_2.txt

Song two
yaya ya yaa 
blaaa bla bla blaaaaa

Third filename should be Song_3.txt

Song three
bla bla bla

How can I do this using awk, grep, perl, python, and/or whatever unix-based tools and languages available ?

user1421408
  • 207
  • 2
  • 9

3 Answers3

4
csplit Song.txt --elide-empty-files --prefix=Song_ --suffix-format='%1d.txt' '/Song one/' '/Song two/' '/Song three/'

or

csplit Song.txt -z -f Song_ -b '%1d.txt' '/Song one/' '/Song two/' '/Song three/'
Dennis Williamson
  • 346,391
  • 90
  • 374
  • 439
3

csplit can be used to split a text file using a regex.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
2

This should help -

gawk -v RS="Song" 'NF{ print RS$0 > "Song_"++n".txt" }' Song.txt

Test:

[jaypal:~/Temp] cat Song.txt 
Song one
bla bla bla bla 
bla bla bla bla bla
Song two
yaya ya yaa 
blaaa bla bla blaaaaa
Song three
bla bla bla

[jaypal:~/Temp] gawk -v RS="Song" 'NF{ print RS$0 > "Song_"++n".txt" }' Song.txt

[jaypal:~/Temp] ls -l S*
-rw-r--r--  1 jaypalsingh  staff  113 28 May 17:55 Song.txt
-rw-r--r--  1 jaypalsingh  staff   47 28 May 18:06 Song_1.txt
-rw-r--r--  1 jaypalsingh  staff   45 28 May 18:06 Song_2.txt
-rw-r--r--  1 jaypalsingh  staff   24 28 May 18:06 Song_3.txt

[jaypal:~/Temp] cat Song_1.txt 
Song one
bla bla bla bla 
bla bla bla bla bla

[jaypal:~/Temp] cat Song_2.txt 
Song two
yaya ya yaa 
blaaa bla bla blaaaaa

[jaypal:~/Temp] 
jaypal singh
  • 74,723
  • 23
  • 102
  • 147
  • :y This is even better .. I will try to understand how it is written many thanks – user1421408 May 29 '12 at 04:56
  • 1
    @user1421408 You're Welcome. What we have done here is set the Record Separator to `Song`. That separates all Songs. `$0` marks everything else. Since you need the word "Song" in your individual files, we output `RS $0` instead of writing the word "Song". `++n` ensures we write every record to a new file incremented by a number. `NF` ensures we don't write the first file as just `Song`. Hope this helps! – jaypal singh May 29 '12 at 05:14