1

I have a single text file that contains 60K+ lines in it. Those 60K+ lines are actually around 50 or so programs written in Natural. I need to break them apart into individual programs. I have a script that works perfectly with a single flaw. The naming of the output files.

Every program starts with "Module Name=", followed by the actual name of the program. I need to split the programs and save them using the actual program names.

Using the example below, I would like to create two files called Program1.txt and Program2.txt each containing the lines belonging to them. I have a script, also below, that separates the files correctly, but I am unable to discern the correct way to capture the Program name and use that as the name of the output file.

Example:

Module Name=Program1
....
....
....
END

Module Name=Program2
....
....
....
END

Code:

$InputFile = "C:\Natural.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)
$a = 1
While (($Line = $Reader.ReadLine()) -ne $null) {
    If ($Line -match "Module Name=") {
        $OutputFile = "MySplittedFileNumber$a.txt"
        $a++
    }    
    Add-Content $OutputFile $Line
}
halfer
  • 19,824
  • 17
  • 99
  • 186
user3166462
  • 171
  • 9
  • 1
    I commend to your attention Microsoft Docs on [`-Split`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_split?view=powershell-7.1) and [`-Join`](https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/about/about_join?view=powershell-7.1). – Jeff Zeitlin May 26 '21 at 18:45
  • @JeffZeitlin I am attempting to alter the code using the -Split command. I will post success or failure. -Ron – user3166462 May 26 '21 at 19:57

2 Answers2

3

Combine a switch statement, which can read files line by line efficiently with -File and can match each line against regex(es) with -Regex, and use a System.IO.StreamWriter instance to write the output files efficiently:

$outStream = $null

switch -Regex -File C:\Natural.txt {
  '\bModule Name=(\w+)' {   # a module start line
    if ($outStream) { $outStream.Close() }
    $programName = $Matches[1] # Extract the program name.
    # Create a new output file.
    # Important: use a *full* path.
    $outStream = [System.IO.StreamWriter] "C:\$programName.txt"
    # Write the line at hand.
    $outStream.WriteLine($_)
  }
  default {                 # all other lines
    # Write the line at hand to the current output file.
    $outStream.WriteLine($_)    
  }
}
if ($outStream) { $outStream.Close() }

Note:

  • The code assumes that the very first line in the input file is a Module Name=... line.

  • The regex matching is case-insensitive by default, as PowerShell generally is; add -CaseSensitive, if needed.

  • The automatic $Matches variable is used to extract the program name from the matching result.

mklement0
  • 382,024
  • 64
  • 607
  • 775
0

Thank you Jeff!

Here is my solution using the Split Command

$InputFile = "C:\Temp\EMNCP\Natural.txt"
$Reader = New-Object System.IO.StreamReader($InputFile)

$OPName = @()
While (($Line = $Reader.ReadLine()) -ne $null) {
    If ($Line -match "Module Name=") {
        $OPName = $Line.Split("=")
        $FileName = $OPName[1].Trim()
        Write-Host "Found ... $FileName" -foregroundcolor green
        $OutputFile = "$FileName.txt"

    }    
    Add-Content $OutputFile $Line
}
user3166462
  • 171
  • 9
  • Nice; a few tips: `$OPName = @()` initializes `$OPName` as an _array_, even though you want to it to be a _string_, but you actually don't need to initialize it at all. (The only way to lock in a type would be to _type-constrain_ the assignment: `[string] $OPName = ''`) – mklement0 May 26 '21 at 20:29
  • It's better to close / dispose of the stream reader explicitly (`$Reader.Close()`). – mklement0 May 26 '21 at 20:29
  • While using `Add-Content` in a loop works, it is quite slow, because the output file must be opened and closed for every call; hence the use of a `[System.IO.StreamWriter]` in my solution. – mklement0 May 26 '21 at 20:31