I am trying to split a txt transcription into single files, one for each folio.
The file is marked as [c. 1r]
,[c. 1v]
... [c. 7v]
and so on.
Using this example I was able to create a PowerShell script that does the magic with a regex that match each page delimiter , but I seem totally unable to use the regex in order to give proper names to the pages. With this code
$InputFile = "input.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)
$a = 1
while (($Line = $Reader.ReadLine()) -ne $null) {
if ($Line -match "\[c\. .*?\]") {
$OutputFile = "MySplittedFileNumber$a$Matches.txt"
$a++
}
Add-Content $OutputFile $Line
}
all the files are named with MySplittedFileNumber1System.Collections.Hashtable.txt
instead of the match, with "$Matches[0]"
I'm told that the variable does not exist or has been filtered by -Exclude
.
All my attempts of setting the $regex
before executing seems to go nowhere, can someone point me on how to get the result filenames formatted as MySplittedFileNumber[c. 1r].txt
.
Using just a partial match as \[(c\. .*?)\]
would be even better, but once I know how to retrieve the match, I bet I can find the solution.
I can do the variable 1r
1v
setting in $a
, somehow, but I'd rather use the one inside the txt file, since some folio may have been misnumbered in the manuscript and I need to retain this.
Content of original input.txt
:
> [c. 1r] Text paragraph text paragraph ... Text paragraph [c. 1v] Text paragraph text paragraph ... Text paragraph [c. 2r] Text paragraph text paragraph ... Text paragraph
Desired result:
Content of MySplittedFileNumber[c. 1r].txt
:
> [c. 1r] Text paragraph text paragraph ... Text paragraph
Content of MySplittedFileNumber[c. 1v].txt
:
> [c. 1v] Text paragraph text paragraph ... Text paragraph
Content of MySplittedFileNumber[c. 2r].txt
:
> [c. 2r] Text paragraph text paragraph ... Text paragraph