1

enter image description heregetting memory exception while running this code. Is there a way to filter one file at a time and write output and append after processing each file. Seems the below code loads everything to memory.

$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
Get-ChildItem $inputFolder -File -Filter '*.csv' |
    ForEach-Object { Import-Csv $_.FullName } |
    Where-Object { $_.machine_type -eq 'workstations' } |
    Export-Csv $outputFile -NoType

mklement0
  • 382,024
  • 64
  • 607
  • 775
Enigma
  • 123
  • 1
  • 13
  • CSVs are just text. The ImportCSV cmdlet is useful for manipulating CSV data in PowerShell, but if all you want to do is to append one CSV onto another (assuming they're the same width) you can just read them as if they were text. `get-content *.csv | set-content combined.csv` Seems like it should work. Might have to mess with line endings? – Joe Cullinan Nov 01 '19 at 14:34
  • 1
    @Joe: I need to filter the combined.csv with machine_type -eq "workstation" only – Enigma Nov 01 '19 at 14:57
  • your CSVs have Always same columns in the same order? – Esperento57 Nov 07 '19 at 03:34

5 Answers5

1

Note: The reason for not using Get-ChildItem ... | Import-Csv ... - i.e., for not directly piping Get-ChildItem to Import-Csv and instead having to call Import-Csv from the script block ({ ... } of an auxiliary ForEach-Object call, is a bug in Windows PowerShell that has since been fixed in PowerShell Core - see the bottom section for a more concise workaround.

However, even output from ForEach-Object script blocks should stream to the remaining pipeline commands, so you shouldn't run out of memory - after all, a salient feature of the PowerShell pipeline is object-by-object processing, which keeps memory use constant, irrespective of the size of the (streaming) input collection.

You've since confirmed that avoiding the aux. ForEach-Object call does not solve the problem, so we still don't know what causes your out-of-memory exception.

Update:

  • This GitHub issue contains clues as to the reason for excessive memory use, especially with many properties that contain small amounts of data.

  • This GitHub feature request proposes using strongly typed output objects to help the issue.

The following workaround, which uses the switch statement to process the files as text files, may help:

$header = ''
Get-ChildItem $inputFolder -Filter *.csv | ForEach-Object {
  $i = 0
  switch -Wildcard -File $_.FullName {
    '*workstations*' {
      # NOTE: If no other columns contain the word `workstations`, you can 
      # simplify and speed up the command by omitting the `ConvertFrom-Csv` call 
      # (you can make the wildcard matching more robust with something 
      # like '*,workstations,*')
      if ((ConvertFrom-Csv "$header`n$_").machine_type -ne 'workstations') { continue }
      $_ # row whose 'machine_type' column value equals 'workstations'
    }
    default {
      if ($i++ -eq 0) {
        if ($header) { continue } # header already written
        else { $header = $_; $_ } # header row of 1st file
      }
    }
  }
} | Set-Content $outputFile

Here's a workaround for the bug of not being able to pipe Get-ChildItem output directly to Import-Csv, by passing it as an argument instead:

Import-Csv -LiteralPath (Get-ChildItem $inputFolder -File -Filter *.csv) |
    Where-Object { $_.machine_type -eq 'workstations' } |
    Export-Csv $outputFile -NoType

Note that in PowerShell Core you could more naturally write:

Get-ChildItem $inputFolder -File -Filter *.csv | Import-Csv |
  Where-Object { $_.machine_type -eq 'workstations' } |
    Export-Csv $outputFile -NoType
mklement0
  • 382,024
  • 64
  • 607
  • 775
  • much appreciate your help with this.But im still getting memory exceptopn.`PS C:\change\2019\October> $inputFolder = "C:\Change\2019\October" PS C:\change\2019\October> $outputFile = "C:\Change\2019\output.csv" PS C:\change\2019\October> Import-Csv -LiteralPath (Get-ChildItem $inputFolder -File -Filter *.csv) | >> Where-Object { $_.machine_type -eq 'workstation' } | >> Export-Csv $outputFile -NoType Exception of type 'System.OutOfMemoryException' was thrown. At line:1 char:1, OutOfMemoryException + FullyQualifiedErrorId : System.OutOfMemoryException` – Enigma Nov 02 '19 at 05:36
  • PS C:\WINDOWS\system32> $PSVersionTable Name Value ---- ----- PSVersion 5.1.17763.771 PSEdition Desktop PSCompatibleVersions {1.0, 2.0, 3.0, 4.0...} BuildVersion 10.0.17763.771 CLRVersion 4.0.30319.42000 WSManStackVersion 3.0 PSRemotingProtocolVersion 2.3 SerializationVersion 1.1.0.1 – Enigma Nov 02 '19 at 07:50
  • @Enigma: Please see my update that shows another workaround. If that doesn't help, we could try forcing periodic garbage collection to force previously allocated objects to be released. – mklement0 Nov 02 '19 at 16:02
  • @Enigma: If no other columns contain the word `workstations`, you can simplify and speed up the command by omitting the `ConvertFrom-Csv` call (you can make the wildcard matching more robust with something like `'*,workstations,*'`) – mklement0 Nov 02 '19 at 21:52
  • The workaround works but the output csv seems not truncating after each line with data populating after the table header data column horizontally. – Enigma Nov 06 '19 at 19:55
  • @Enigma: By _truncating_, do you mean newlines (line breaks)? The workaround sends the lines one by one to `Set-Content`, which automatically puts a newline between them. Is there something unusual about your input `*.csv` files? – mklement0 Nov 06 '19 at 20:23
1

May be can you export and filter your files one by one and append result into your output file like this :

$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"

Remove-Item $outputFile -Force -ErrorAction SilentlyContinue

Get-ChildItem $inputFolder -Filter "*.csv" -file | %{import-csv $_.FullName | where machine_type -eq 'workstations' | export-csv $outputFile -Append -notype }
Esperento57
  • 16,521
  • 3
  • 39
  • 45
  • thanks for this. Even Though it still throws memory error it manages to create the csv file and append the data. I have added -ErrorAction SilentlyContinue to the export-csv line as well so it can continue with processing the data. – Enigma Nov 06 '19 at 19:58
  • 2
    @Enigma: An out-of-memory error is a statement-terminating error, so it's likely that data will be missing from your output CSV file. – mklement0 Nov 06 '19 at 20:28
  • 1
    mklement0 has right, if you have still throws error you cant use this solution – Esperento57 Nov 07 '19 at 03:05
  • @Enigma: After the `Export-Csv` command, you can place a command that forces garbage collection to see if that gets rid of the error: `[gc]::Collect(); [gc]::WaitForPendingFinalizers()`; also, I've found issues on GitHub relating to the issue - see my updated answer (speaking of: I still don't understand why the `switch`-based workaround doesn't work for you). – mklement0 Nov 09 '19 at 17:01
0

Solution 2 :

$inputFolder = "C:\Change\2019\October"
$outputFile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8  # modify encoding if necessary
$Delimiter=','

#find header for your files => i take first row of first file with data
$Header = Get-ChildItem -Path $inputFolder -Filter *.csv | Where length -gt 0 | select -First 1 | Get-Content -TotalCount 1

#if not header founded then not file with sise >0 => we quit
if(! $Header) {return}

#create array for header
$HeaderArray=$Header -split $Delimiter -replace '"', ''

#open output file
$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)

#write header founded
$w.WriteLine($Header)


#loop on file csv
Get-ChildItem $inputFolder -File -Filter "*.csv" | %{

    #open file for read
    $r = New-Object System.IO.StreamReader($_.fullname, $encoding)
    $skiprow = $true

    while ($line = $r.ReadLine()) 
    {
        #exclude header
        if ($skiprow) 
        {
            $skiprow = $false
            continue
        }

        #Get objet for current row with header founded
        $Object=$line | ConvertFrom-Csv -Header $HeaderArray -Delimiter $Delimiter

        #write in output file for your condition asked
        if ($Object.machine_type -eq 'workstations') { $w.WriteLine($line) }

    }

    $r.Close()
    $r.Dispose()

}

$w.close()
$w.Dispose()
Esperento57
  • 16,521
  • 3
  • 39
  • 45
-1

You have to read and write to the .csv files one row at a time, using StreamReader and StreamWriter:

$filepath = "C:\Change\2019\October"
$outputfile = "C:\Change\2019\output.csv"
$encoding = [System.Text.Encoding]::UTF8

$files = Get-ChildItem -Path $filePath -Filter *.csv | 
         Where-Object { $_.machine_type -eq 'workstations' }

$w = New-Object System.IO.StreamWriter($outputfile, $true, $encoding)

$skiprow = $false
foreach ($file in $files)
{
    $r = New-Object System.IO.StreamReader($file.fullname, $encoding)
    while (($line = $r.ReadLine()) -ne $null) 
    {
        if (!$skiprow)
        {
            $w.WriteLine($line)
        }
        $skiprow = $false
    }
    $r.Close()
    $r.Dispose()
    $skiprow = $true
}

$w.close()
$w.Dispose()
Max Voisard
  • 1,685
  • 1
  • 8
  • 18
  • 1
    are we missing Import-csv in the line...'$files = Get-ChildItem -Path $filePath -Filter *.csv | Where-Object { $_.machine_type -eq 'workstations' }' – Enigma Nov 02 '19 at 06:10
-2

get-content *.csv | add-content combined.csv

Make sure combined.csv doesn't exist when you run this, or it's going to go full Ouroboros.

Joe Cullinan
  • 552
  • 3
  • 15
  • This will duplicate header rows in the output file, and it also doesn't address the filter requirement. (Aside from that, `Set-Content` should be used). – mklement0 Nov 01 '19 at 17:24
  • Won't `Set-Content` overwrite the already-set contents of `combined.csv`, leaving it a duplicate of the last CSV file that was picked up? – Joe Cullinan Nov 01 '19 at 19:03
  • No, whatever Set-Content receives via the pipeline all goes into the target file. – mklement0 Nov 01 '19 at 19:54