2

I'm writing a Powershell script (PS version 4) to parse and process IIS log files, and I've come across an issue I don't quite understand: write-output seems to add significant processing time to the script. The core of it is this (there is more, but this demonstrates the issue):

$file = $args[0]
$linecount = 0
$start = [DateTime]::Now
$reader = [IO.File]::OpenText($file)
while ($reader.Peek() -ge 0) {
    $line = $reader.ReadLine()
    $linecount++

    if (0 -eq ($linecount % 10000)) {
        $batch = [DateTime]::Now
        [Console]::Error.WriteLine("    Processed $linecount lines ($file) in $(($batch - $start).TotalMilliseconds)ms")
        $start = $batch
    }

    $parts = $line.split(' ')
    $out = "$file,$($parts[0]) $($parts[1]),$($parts[2]),$($parts[3]),$($parts[4]),$($parts[5]),$($parts[6]),$($parts[7])"


    ## Send the output out - comment in/out the desired output method
    ## Direct output - roughly 10,000 lines / 880ms
    $out

    ## Via write-output - roughly 10,000 lines / 1500ms
    write-output $out
}
$reader.Close()

Invoked as .\script.ps1 {path_to_340,000_line_IIS_log} > bob.txt; progress/performance timings are given on stderr.

The script above has two output lines - the write-output one is reporting 10,000 lines every 1500ms, whereas the line that does not have write-output takes as good as half that, averaging about 880ms per 10,000 lines.

I thought that an object defaulted to write-output if it had no other thing (i.e., I thought that "bob" was equivalent to write-output "bob"), but the times I'm getting argue against this.

What am I missing here?

Chris J
  • 30,688
  • 6
  • 69
  • 111

1 Answers1

1

Just a guess, but:

Looking at the help on write-output

Write-Output [-InputObject] <PSObject[]> [-NoEnumerate]  [<CommonParameters>]

You're giving it an list of objects as an argument, so it's having to spend a little time assembling to them into an array internally before it does the write, whereas simply outputting them just streams them to the pipeline immediately. You could pipe them to Write-Object, but that's going to add another pipeline which might be even worse.

Edit
In addition you'll find that it's adding .062ms per operation (1500 -880)/10000. You have to scale that to very large data sets before it becomes noticeable.

Chris J
  • 30,688
  • 6
  • 69
  • 111
mjolinor
  • 66,130
  • 7
  • 114
  • 135
  • It'll be an array of one item it'll be building up - it should be getting a single string as input. As a test I've just replaced with `$string = "{...all the stuff...}"; write-object $string` (see edits in the code above) and the same stats apply. Similarly, `$string | write-object` has similarly slow performance to `write-object $string`. You may be right that it's simply due to doubling the pipeline up, but that's quite a large impact just for something that's acting as a pass-through. – Chris J Jan 24 '14 at 11:54
  • It's adding .062ms per operation (1500 -880)/10000. You have to scale that to very large data sets before it becomes noticeable. – mjolinor Jan 24 '14 at 12:26
  • For this kind of task you'll be able to get better performance using Get-Content with -Readcount and using -Replace and a good regex. With a ReadCount of 1000, you'll get arrays of 1000 objects at a time. Using -Replace as an array operator, you can cut the number of loop iterations by a factor of 1000. – mjolinor Jan 24 '14 at 13:24
  • that's probably the thing I'm missing. So it's actually adding negligable, but I'm processing large datasets - tbh, I should have done the maths myself, don't know why I didn't. Btw, the way I'm reading the file is based on http://stackoverflow.com/a/9439318/130352 ... I don't see where I'd use `-replace` though in the script above (ultimately I'm getting an unpredictable format IIS log into a predictable CSV, so there's very little search & replace; in essence I'm processing the #Fields: header to determine the field order so I can spit out the CSV in the right order) – Chris J Jan 24 '14 at 14:43