1

From all my reading, the oledb datareader does not store records in memory, but this code is maxing out the RAM. Its meant to pull data from an Oracle db (about 10M records) and write them to a GZIP file. I have tried everything (including commenting out the Gzip write) and it still ramps up the RAM until it falls over. Is there are way to just execute the reader without it staying in memory? What am I doing wrong?

$tableName='ACCOUNTS'
$fileNum=1
$gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
$con=Open-Con ORA -tns $tns -userName $userName -fetchSize $fetchSize
$cmd = New-Object system.Data.OleDb.OleDbCommand($sql,$con);           
$cmd.CommandTimeout = '0';        
$output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
[System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
$encoding = [System.Text.Encoding]::UTF8
$reader=$cmd.ExecuteReader()
[int]$j=0
While ($reader.Read())            
{            
        $j++
        $str=$reader[0..$($reader.Fieldcount-1)] -join '|'                        
        $out=$encoding.GetBytes($("$str`n").ToString() )
        $gzipStream.Write($out,0, $out.length)
        if($j % 10000 -eq 0){write-host $j}
        if($j % 1000000 -eq 0){
            write-host 'creating new gz file'
            $gzipStream.Close();
            $gzipStream.Dispose()
            $fileNum+=1
            $gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
            $output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
            [System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
            }
}

Edit: from the comments, [system.gc]::Collect() had no effect. Also, stripping it down to the simplest form and only reading a single field also had no effect. This code ramps up to 16GB memory (viewed in task manager) and then quits with OOM

$con=Open-Con ORA -tns $tns -userName $userName -fetchSize $fetchSize
$cmd = New-Object system.Data.OleDb.OleDbCommand($sql,$con);           
$cmd.CommandTimeout = '0';        
            
$reader=$cmd.ExecuteReader()
[int]$j=0
While ($reader.Read())            
{            
    $str=$reader[0]                       
}    
neuralgroove
  • 580
  • 4
  • 12
  • does calling on `[system.gc]::Collect()` make a difference? – Abraham Zinala Feb 21 '22 at 22:43
  • 1
    Did you try [SequentialAccess](https://learn.microsoft.com/en-us/dotnet/api/system.data.commandbehavior) instead? `$reader = $cmd.ExecuteReader(CommandBehavior.SequentialAccess)` – mhu Feb 27 '22 at 12:27
  • My suggestion is to lower the MOD conditional on writing the new gz file from 1M to 1K to see if drastically lowering the row-read count before writing is at the heart of the issue. If indeed that gets files written, then you could create a PSH routine to append the gzip files after all are written. -- Sorry for the kludge ;) – JasonInVegas Mar 02 '22 at 00:04

1 Answers1

0

Possibly it's using up virtual address space rather than actual RAM. That's a common problem with the underlying .Net garbage collector used with (at least) the ADO.Net and string objects created here, especially if any of the records have fields with lots of text.

Building on that, it looks like you're doing most of the correct things to avoid this issue (using DataReader, writing directly to a stream, etc). What you could do to improve this is writing to the stream one field at a time, rather than using -join to push all the fields into the same string and then writing, and making sure we re-use the same $out array buffer (though I'm not sure exactly what this last looks like in PowerShell or with Encoding.GetBytes().

This may help, but it still can create issues with how it concatenates the fieldDelimiter and line terminator. If you find this runs for longer, but still eventually produces an error, you probably need to do the tedious work to have separate write operations to the gzip stream for each of those values.

$tableName='ACCOUNTS'
$fileNum=1
$gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
$con=Open-Con ORA -tns $tns -userName $userName -fetchSize $fetchSize
$cmd = New-Object system.Data.OleDb.OleDbCommand($sql,$con);           
$cmd.CommandTimeout = '0';        
$output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
[System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
$encoding = [System.Text.Encoding]::UTF8
$reader=$cmd.ExecuteReader()
[int]$j=0
While ($reader.Read())            
{            
        $j++
        $fieldDelimiter= ""
        $terminator = ""
        for ($k=0;$k -lt $reader.Fieldcount;$k++) {
            if ($k -eq $reader.Fieldcount - 1) { $terminator = "`n"}

            $out = $encoding.GetBytes("$fieldDelimiter$($reader[$k])$terminator")
            $gzipStream.Write($out,0,$out.length)

            $fieldDelimiter= "|"            
        }      

        if($j % 10000 -eq 0){write-host $j}
        if($j % 1000000 -eq 0){
            write-host 'creating new gz file'
            $gzipStream.Close();
            $gzipStream.Dispose()
            $fileNum+=1
            $gzFilename="c:\temp\gzip\$tableName.$fileNum.txt.gz"
            $output = New-Object System.IO.FileStream $gzFilename, ([IO.FileMode]::Create), ([IO.FileAccess]::Write), ([IO.FileShare]::None)
            [System.IO.Compression.GzipStream]$gzipStream = New-Object System.IO.Compression.GzipStream $output, ([IO.Compression.CompressionMode]::Compress)
            }
}
Joel Coehoorn
  • 399,467
  • 113
  • 570
  • 794
  • Thanks for the answer, I edited the post a bit. It won't let go of the memory until the $reader is closed. Even without concatenating strings or writing to a stream. – neuralgroove Feb 21 '22 at 23:48