3

I need to define an alias for this:
Select-String -NotMatch -Pattern "^[\t ]+\d"

so that I can use the alias instead of writing that long string each time.
After googling for 5 minutes and doing some experiments I came up with this:

filter foo {
    $_ | Select-String -NotMatch -Pattern "^[\t ]+\d"
}

So now my script looks like this:

command1 | foo
command2 | foo
command3 | foo
command4 | foo

This is apparently working as expected, but I'm concerned about the efficiency implications of doing this.
Is the foo filter acting as a transparent alias of the longer command line, or is it creating an entire new pipe or buffer or something?

Mercalli
  • 700
  • 6
  • 16
  • 1
    I wouldn't be too concerned about performance. A filter function is just a shortcut for writing an advanced function that processes pipeline input in its `process{}` block. Though there might be a functionality issue, depending on your use case. As you are creating a sub pipeline within the function, which processes each input object individually, the function will fail if the input is formatted (e. g. output of `Format-Table`). If you have a need to support formatted input you would have to write a proxy (wrapper) function. See https://stackoverflow.com/a/73074477/7571258 – zett42 Jun 03 '23 at 12:57
  • 1
    This should be a steppable pipeline. Your current implementation is invoking `Select-String` per input object – Santiago Squarzon Jun 03 '23 at 14:24

2 Answers2

3

Is the foo filter acting as a transparent alias of the longer command line, or is it creating an entire new pipe or buffer or something?

The latter, your current implementation is invoking Select-String per pipeline input object instead of invoking it once and processing all input. If you care about performance you should change your implementation for a steppable pipeline:

function steppablefoo {
    param([Parameter(ValueFromPipeline)] $InputObject)

    begin {
        $pipe = { Select-String -NotMatch -Pattern '^[\t ]+\d' }.GetSteppablePipeline()
        $pipe.Begin($PSCmdlet)
    }
    process {
        $pipe.Process($InputObject)
    }
    end {
        $pipe.End()
    }
}

You can test it for yourself with this performance comparison:

$tests = @{
    Filter = {
        0..300kb | foo
    }
    SteppablePipeline = {
        0..300kb | steppablefoo
    }
}

$tests.GetEnumerator() | ForEach-Object {
    [pscustomobject]@{
        Test              = $_.Key
        TotalMilliseconds = (Measure-Command { & $_.Value }).TotalMilliseconds
    }
}
Santiago Squarzon
  • 41,465
  • 5
  • 14
  • 37
3

To complement Santiago's helpful answer:

A simpler - but limited - alternative is to use a simple (as opposed to advanced) function (not filter). The limitations are:

  • Due to use of the automatic $input variable and no per-input-object processing, all input is collected in memory up front, whereas the steppable-pipeline solution in Santiago's answer preserves the streaming behavior of Select-String.

    • This implies that display output won't start until a given input command has emitted all its objects.
  • You may pass any additional parameters to pass through to Select-String ad hoc (on invocation), via the splatted form of the automatic $args variable (@args), but you won't get tab completion.

    • For tab-completion support, you could duplicate Select-String's parameter declarations, which is cumbersome, however.

    • An automated way to duplicate them is to use [System.Management.Automation.ProxyCommand]::Create((Get-Command 'Select-String')) to scaffold your function - but that is the very way to define a proxy function, as shown in simplified form in Santiago's answer. You may therefore just as well create such a proxy function rather than the simplified form below. See this answer for more information about proxy functions.

function foo { $input | Select-String -NotMatch -Pattern '^[\t ]+\d' @args  }
mklement0
  • 382,024
  • 64
  • 607
  • 775