1

I have a 200,000 file to be split into 8 chunks using powershell

The file has rows with the first value being the record 'KEY'

I would like to ensure that rows corresponding to the key field value (which is the first value of the row) do not break across files when the split happens.

Here is the simple split I use

$i=0
Get-Content -Encoding Default "C:\Test.csv" -ReadCount 10130 | ForEach-Object {
    $i++
    $_ | Out-File -Encoding Default "C:\Test_$i.csv"
}

Sample Data
0190709,HP16,B,B,3,3,
0190709,HP17,B,B,3,3,
0190709,HP18,B,B,3,3,
0196597,HP11,,CNN,,,
0196597,HP119,,CNN,,,
0196597,HP13,,CNN,,,
01919769,HP11,,ANN,,,
01919769,HP119,,OPN,,,
01919769,HP13,,CNN,,,
01919769,HP14,X,X,X,X,
01919769,HP15,A,A,X,X,
01919769,HP16,S,S,X,X,
01919769,HP17,S,S,5,5,
01919769,HP18,S,S,5,5,
0797819,HP14,X,AX,X,X,
0797819,HP15,X,XA,X,X,
0797819,HP16,X,X,XA,XA,
0797819,HP17,A,A,X,X,
0797819,HP18,A,A,AX,X,

Expected Output

Lets say we want 2 chunks of equal size. I would like 2 files like below with the key not split between files. Its ok if the file gets bigger (more lines) in an attempt to prevent page break of the key.

File 1

0190709,HP16,B,B,3,3,
0190709,HP17,B,B,3,3,
0190709,HP18,B,B,3,3,
0196597,HP11,,CaweNN,,,
0196597,HP119,,CNN,,,
0196597,HP13,,CNwN,,,
01919769,HP11,,AawNN,,,
01919769,HP119,,OePN,,,
01919769,HP13,,CNN,,,
01919769,HP14,XY,X,X,X,
01919769,HP15,A,A,XC,XA,
01919769,HP16,S,S,X,X,
01919769,HP17,S,S,5A,5,
01919769,HP18,S,S,5,5,

File 2

0797819,HP14,X,AX,X,X,
0797819,HP15,X,XA,X,X,
0797819,HP16,X,X,XA,XA,
0797819,HP17,A,A,X,X,
0797819,HP18,A,A,AX,X,

1 Answers1

0

Although you have not supplied an example (first couple of lines) of your CSV file, the below function assumes the input csv file is valid.

function Split-Csv {
    [CmdletBinding()]  
    Param (
        [Parameter(Mandatory = $true, Position = 0)]
        [string]$Path,          # the full path and filename of the source CSV file

        [Parameter(Mandatory = $true, Position = 1)]
        [string]$Destination,   # the path of the output folder

        [ValidateRange(1,[int]::MaxValue)]
        [int]$Chunks = 8,       # the number of parts to split into

        [switch]$FirstLineHasHeaders
    ) 
    # create the destination folder if it does not already exist
    if (!(Test-Path -Path $Destination -PathType Container)) {
        Write-Verbose "Creating folder '$Destination'"
        New-Item -Path $Destination -ItemType Directory | Out-Null
    }
    $outputFile = [System.IO.Path]::GetFileNameWithoutExtension($Path)
    $content    = Get-Content -Path $Path
    $totalLines = $content.Count

    if ($FirstLineHasHeaders) {
        $headers  = $content[0]
        $partsize = [Math]::Ceiling(($totalLines - 1) / $Chunks)
        for ($i = 0; $i -lt $Chunks; $i++) {
            $first   = ($i * $partsize + 1)
            $last    = [Math]::Min($first + $partsize -1, $totalLines - 1)
            $newFile = Join-Path -Path $Destination -ChildPath ('{0}-{1:000}.csv' -f $outputFile, ($i + 1))
            Write-Verbose "Creating file '$newFile'"
            Set-Content -Path $newFile -Value $headers -Force
            Add-Content -Path $newFile -Value $content[$first..$last]
        }
    }
    else {
        $partsize   = [Math]::Ceiling($totalLines / $Chunks)
        for ($i = 1; $i -le $Chunks; $i++) {
            $first   = (($i - 1) * $partsize)
            $last    = [Math]::Min(($i * $partsize) - 1, $totalLines - 1)
            $newFile = Join-Path -Path $Destination -ChildPath ('{0}-{1:000}.csv' -f $outputFile, $i)
            Write-Verbose "Creating file '$newFile'"
            Set-Content -Path $newFile -Value $content[$first..$last] -Force
        }
    }
}

If your input csv file has headers, you need to ensure every 'chunk' file also has these headers. Use the function WITH switch $FirstLineHasHeaders

Split-Csv -Path 'C:\Test.csv' -Destination 'D:\test' -Chunks 8 -FirstLineHasHeaders -Verbose

If your input csv file does NOT have headers, use it like:

Split-Csv -Path 'C:\Test.csv' -Destination 'D:\test' -Chunks 8 -Verbose
Theo
  • 57,719
  • 8
  • 24
  • 41