tl;dr:
Because of how binding pipeline input to parameters works in PowerShell (see below), defining a parameter that accepts pipeline input as well as direct parameter-value passing of arrays:
- indeed requires looping inside the
process
block
- invariably wraps individual input objects received through the pipeline in a single-element array each, which is inefficient.
Defining your pipeline-binding parameters as a scalar avoids this awkwardness, but passing multiple inputs is then limited to the pipeline - you won't be able to pass arrays as a parameter argument.[1]
This asymmetry is perhaps surprising.
When you define a parameter that accepts pipeline input, you get implicit array logic for free:
With pipeline input, PowerShell calls your process
block once for each input object, with the current input object bound to the parameter variable.
By contrast, passing input as a parameter value only ever enters the process
once, with the input as a whole bound to your parameter variable.
The above applies whether or not your parameter is array-valued: each pipeline input object individually is bound / coerced to the parameter's type exactly as declared.
To put this in concrete terms with your example function that declares parameter [Parameter(Mandatory=$true, ValueFromPipeline=$True)] [string[]] $myNames
:
Let's assume an input array (collection) of 'foo', 'bar'
(note that the @()
around array literals is normally not necessary).
Parameter-value input, Test-BeginProcessEnd -myNames 'foo', 'bar'
:
- The
process
block is called once,
- with input array
'foo', 'bar'
bound to $myNames
as a whole.
Pipeline input, 'foo', 'bar' | Test-BeginProcessEnd
:
- The
process
block is called twice,
- with
'foo'
and 'bar'
each coerced to [string[]]
- i.e., a single-element array.
To see it in action:
function Test-BeginProcessEnd
{
[cmdletbinding()]
Param(
[Parameter(Mandatory, ValueFromPipeline)]
[string[]]$myNames
)
begin {}
process {
Write-Verbose -Verbose "in process block: `$myNames element count: $($myNames.Count)"
foreach ($name in $myNames) { $name }
}
end {}
}
# Input via parameter
> Test-BeginProcessEnd 'foo', 'bar'
VERBOSE: in process block: $myNames element count: 2
foo
bar
# Input via pipeline
> 'foo', 'bar' | Test-BeginProcessEnd
VERBOSE: in process block: $myNames element count: 1
foo
VERBOSE: in process block: $myNames element count: 1
bar
Optional reading: Various tips re functions and pipeline input
begin
, process
, end
blocks may be used in a function whether or not it is an advanced function (cmdlet-like - see below).
- If you only need the 1st or a certain number of objects from the pipeline, there is currently no way to exit the pipeline prematurely; instead, you must set a Boolean flag that tells you when to ignore subsequent
process
block invocations.
- You can, however use an intervening, separate call such as
| Select-Object -First 1
, which efficiently exits the pipeline after the desired number of objects have been received.
- The current inability to do the same from user code is the subject of this suggestion on GitHub.
- Alternatively, you can forgo a
process
block and use $Input | Select-Object 1
inside your function, but, as stated, that will collect all input in memory first; another - also imperfect - alternative can be found in this answer of mine.
If you do not use these blocks, you can still optionally access pipeline input via the automatic $Input
variable; note, however, that your function then runs after ALL pipeline input has been collected in memory (not object by object as with a process
block).
Generally, though, it pays to use a process
block:
- Objects can be processed one by one, as they're being produced by the source command, which has 2 benefits:
- It makes processing more memory-efficient, because the source command's output doesn't have to be collected in full first.
- Your function starts to produce output right away, without needing to wait for the source command to finish first.
- Hopefully soon (see above), you'll be able to exit the pipeline once all objects of interest have been processed.
- Cleaner syntax and structure: the
process
block is an implicit loop over all pipeline input, and you can selectively perform initialization and cleanup tasks in the begin
and end
blocks, respectively.
It is easy to turn a function into an advanced function, however, which offers benefits with respect to supporting common parameters such as -ErrorAction
, and -OutVariable
as well as detection of unrecognized parameters:
- Use a
param()
block to declare the parameters and decorate that block with the [CmdletBinding()]
attribute, as shown above (also, decorating an individual parameter with a [Parameter()]
attribute implicitly makes a function an advanced one, but for clarity it's better to use [CmdletBinding()]
explicitly).
[1] Strictly speaking, you can, but only if you type your parameter [object]
(or don't specify a type at all, which is the same).
However, the input array/collection is then bound as a whole to the parameter variable, and the process
block is still only entered once, where you'd need to perform your own enumeration.
Some standard cmdlets, such as Export-Csv
, are defined this way, yet they do not enumerate a collection passed via the -InputObject
parameter, making direct use of that parameter effectively useless - see this GitHub issue.