I'm writing a Powershell script (PS version 4) to parse and process IIS log files, and I've come across an issue I don't quite understand: write-output seems to add significant processing time to the script. The core of it is this (there is more, but this demonstrates the issue):
$file = $args[0]
$linecount = 0
$start = [DateTime]::Now
$reader = [IO.File]::OpenText($file)
while ($reader.Peek() -ge 0) {
$line = $reader.ReadLine()
$linecount++
if (0 -eq ($linecount % 10000)) {
$batch = [DateTime]::Now
[Console]::Error.WriteLine(" Processed $linecount lines ($file) in $(($batch - $start).TotalMilliseconds)ms")
$start = $batch
}
$parts = $line.split(' ')
$out = "$file,$($parts[0]) $($parts[1]),$($parts[2]),$($parts[3]),$($parts[4]),$($parts[5]),$($parts[6]),$($parts[7])"
## Send the output out - comment in/out the desired output method
## Direct output - roughly 10,000 lines / 880ms
$out
## Via write-output - roughly 10,000 lines / 1500ms
write-output $out
}
$reader.Close()
Invoked as .\script.ps1 {path_to_340,000_line_IIS_log} > bob.txt
; progress/performance timings are given on stderr.
The script above has two output lines - the write-output
one is reporting 10,000 lines every 1500ms, whereas the line that does not have write-output
takes as good as half that, averaging about 880ms per 10,000 lines.
I thought that an object defaulted to write-output
if it had no other thing (i.e., I thought that "bob"
was equivalent to write-output "bob"
), but the times I'm getting argue against this.
What am I missing here?