I have tried my best to understand a very similar StackOverflow question, but I cannot for the life of me make either the proposed gawk
or split
solutions to work in my case.
I have a large text file consisting of 288 proposals, each of which is 300 to 500 words long and in a varying number of paragraphs (so no consistent line count). Each proposal is headed, however, by an identifier of the following nature: --###--
or --####--
. There is no closing marker -- though I suppose I could insert one by doing some regex search and replace on the original file before splitting it into multiple files. What I want is a collection of 288 individual text files, each of which is named by the number between the two dashes. If it makes things any easier, I can easily split the file between those proposals headed by three numbers and those by four numbers.
In a nutshell, I want to do this:
#! /bin/env bash or python
Split all_proposals.txt into 121.txt, 122.txt, etc.
Where all_proposals.txt consists of:
--121--
One Line Title of Proposal
Followed by several paragraphs each on a line of variable length.
Another paragraph for effect.
--122--
More lines indeterminate in number.