0

Background
I've been looking through several posts here on Stack and can only find answers to "how to add one single row of data to a CSV file" (notably this one). While they are good, they only refer to the specific case of adding a single entry from memory. Suppose I have 100,000 rows I want to add to a CSV file, then the speed of the query will be orders of magnitude slower if I for each row write it to file. I imagine that it will be much faster to keep everything in memory, and once I've built a variable that contains all the data that I want to add, only then write it to file.

Current situation
I have log files that I receive from customers containing about half a million rows. Some of these rows begin with a datetime and how much memory the server is using. In order to get a better view of how the memory usage looks like, I want to plot the memory usage over time using this information. (Note: yes, the best solution would be to ask the developers to add this information as it is fairly common we need this, but since we don't have that yet, I need to work with what I got)

I am able to read the log files, extract the contents, create two variables called $timeStamp and $memoryUsage that finds all the relevant entries. The problem occurs when I occurs when I try to add this to a custom PSObject. It would seem that using a $csvObject += $newRow only adds a pointer to the $newRow variable rather than the actual row itself. Here's the code that I've got so far:

$header1 = "Time Stamp"
$header2 = "Memory Usage"

$csvHeaders = @"
$header1;$header2
"@

# The following two lines are a workaround to make sure that the $csvObject becomes a PSObject that matches the output I'm trying to achieve.
$csvHeaders | Out-File -FilePath $csvFullPath
$csvObject = Import-Csv -Path $csvFullPath -Delimiter ";"

foreach ($TraceFile in $traceFilesToLookAt) {
    $curTraceFile = Get-Content $TraceFile.FullName
    Write-Host "Starting on file: $($TraceFile.Name)`n"

    foreach ($line in $curTraceFile) {
        try {
            if (($line.Substring(4,1) -eq '-') -and ($line.Substring(7,1) -eq '-')) {
                $TimeStamp = $line.Split("|",4)[0]
                $memoryUsage = $($line.Split("|",4)[2]).Replace(",","")

                $newRow = New-Object PSObject -Property @{
                    $header1 = $TimeStamp;
                    $header2 = $memoryUsage
                }
                $reorderedRow = $newRow | Select-Object -Property $header1,$header2

                $reorderedRow | Export-Csv -Path $csvFullPath -Append -Delimiter ";"
            }
        } catch {
            Out-Null
        }

This works fine as it appends the row each time it finds one to the CSV file. The problem is that it's not very efficient.

End goal
I would ideally like to solve it with something like:

$newRow = New-Object PSObject -Property @{
    $header1 = $TimeStamp;
    $header2 = $memoryUsage
}
$rowsToAddToCSV += $newRow 

And then in the final step do a:

$rowsToAddToCSV | Export-Csv -Path $csvFullPath -Append -Delimiter ";"

I have not been able to create any form of workaround for this. Among other things, PowerShell tells me that op_Addition is not part of the object, that the object I'm trying to export (the collection of rows) doesn't match the CSV file etc.

Community
  • 1
  • 1
Tanaka Saito
  • 943
  • 1
  • 17
  • 40

1 Answers1

1

Anything that appends thousands of items to an array in a loop is bound to perform poorly, because each time an item is appended, the array will be re-created with its size increased by one, all existing items are copied, and then the new item is put in the new free slot.

Any particular reason why you can't simply do something like this?

$traceFilesToLookAt | ForEach-Object {
    Get-Content $_.FullName | ForEach-Object {
        if ($_.Substring(4, 1) -eq '-' -and $_.Substring(7, 1) -eq '-') {
            $line = $_.Split('|', 4)
            New-Object PSObject -Property @{
                'Time Stamp'   = $line[0]
                'Memory Usage' = $line[2].Replace(',', '')
            }
        }
    }
} | Export-Csv -Path $csvFullPath -Append -Delimiter ";"

A regular expression match might be an even more elegant approach to extracting timestamp and memory usage from the input files, but I'm going to leave that as an exercise for you.

Ansgar Wiechers
  • 193,178
  • 25
  • 254
  • 328
  • The only reason I can think of is that I've completely forgotten using the pipe, and I have no explanation for it :) Thanks a lot! I will verify this in my environment, but I have one question: I thought you had to specify a name or variable when creating a New-Object PSObject, but I see that this is not the case here. Is this because it is being piped, or can you always use a New-Object PSObject in memory without saving/storing it to a variable? – Tanaka Saito Jan 16 '17 at 08:22
  • Marked as answered as it does the job. For fun I tried your suggestion using Measure-Command in two versions, one with $ErrorActionPreference = 'SilentlyContinue' without try/catch, and one with 'Continue' using try/catch. The time for the query and the output rows are the same, however *the size of the file is twice as large using SilentlyContinue without try/catch*. I'm guessing that without a try/catch it - for whatever reason - becomes a text file rather than a .csv file (try/catch gets #TYPE System.Management.Automation.PSCustomObject in first row), but that might be another discussion :) – Tanaka Saito Jan 16 '17 at 10:46
  • 1
    You can always use `New-Object` without capturing its output. However, if you're not in a pipeline you must capture the object in a variable, otherwise it would just be displayed on the console and you won't be able to use it later on. The `#TYPE ...` header comes from `Export-Csv`. Add the parameter `-NoType` to suppress it. – Ansgar Wiechers Jan 16 '17 at 13:19
  • I've seen many examples having the -NoTypeInformation in it but never reflected over what it did. Thanks a bunch again, I will look into using some form of RegEx later on for the string manipulation :) – Tanaka Saito Jan 16 '17 at 14:12