30

I have the following powershell script

$list = invoke-sqlcmd 'exec getOneMillionRows' -Server...
$list | % {
    GetData $_ > $_.txt
    ZipTheFile $_.txt $_.txt.zip
    ...
}

How to run the script block ({ GetDatta $_ > $_.txt ....}) in parallel with limited maximum number of job, e.g. at most 8 files can be generated at one time?

ca9163d9
  • 27,283
  • 64
  • 210
  • 413

8 Answers8

36

Same idea as user "Start-Automating" posted, but corrected the bug about forgetting to start the jobs that are held back when hitting the else clause in his example:

$servers = @('a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n')

foreach ($server in $servers) {
    $running = @(Get-Job | Where-Object { $_.State -eq 'Running' })
    if ($running.Count -ge 4) {
        $running | Wait-Job -Any | Out-Null
    }

    Write-Host "Starting job for $server"
    Start-Job {
        # do something with $using:server. Just sleeping for this example.
        Start-Sleep 5
        return "result from $using:server"
    } | Out-Null
}

# Wait for all jobs to complete and results ready to be received
Wait-Job * | Out-Null

# Process the results
foreach($job in Get-Job)
{
    $result = Receive-Job $job
    Write-Host $result
}

Remove-Job -State Completed
Allanrbo
  • 2,278
  • 1
  • 23
  • 27
  • Thanks for this, its working. First I tried bugged "official" MS solution from their blog which crashed my powershell, even for 3 jobs. So don't use solution from this site: https://blogs.msdn.microsoft.com/powershell/2011/04/04/scaling-and-queuing-powershell-background-jobs/ – Stritof Feb 10 '16 at 15:29
  • 1
    this is the answer... :) the other one has a logic bug – Andrew Harris Aug 14 '17 at 20:17
  • 1
    `Get-Job | Receive-job -AutoRemoveJob -Wait` to automatically await and remove all jobs (in your example you don't clear up faulted jobs). – riezebosch Feb 05 '18 at 13:22
24

The Start-Job cmdlet allows you to run code in the background. To do what you'd ask, something like the code below should work.

foreach ($server in $servers) {
    $running = @(Get-Job | Where-Object { $_.State -eq 'Running' })
    if ($running.Count -le 8) {
        Start-Job {
             Add-PSSnapin SQL
             $list = invoke-sqlcmd 'exec getOneMillionRows' -Server...
             ...
        }
    } else {
         $running | Wait-Job
    }
    Get-Job | Receive-Job
}

Hope this helps.

GMasucci
  • 2,834
  • 22
  • 42
Start-Automating
  • 8,067
  • 2
  • 28
  • 47
  • 1
    In order to throttle the cue at 8 and to keep pushing another job on the stack as another finishes I think you'll need `$running | Wait-Job -Any`. – Andy Arismendi Jan 09 '12 at 00:33
  • 3
    `Wait-Job -Any`: "Displays the command prompt (and returns the job object) when any job completes. By default, Wait-Job waits until all of the specified jobs are complete before displaying the prompt." – Andy Arismendi Jan 09 '12 at 00:41
  • Not sure what `($server in $server)` does here. But I got the ideal. – ca9163d9 Jan 09 '12 at 05:54
  • BTW, it seems `Where-Object { $_.JobStateInfo.State -eq 'Running' }` can be `Where-Object { $_.State -eq 'Running' }`? – ca9163d9 Jan 09 '12 at 07:38
  • In the example case they wanted to so something on multiple servers, so that is foreach ($server in $severs). And yes, you can write the where-object using the scriptproperty State as well. – Start-Automating Jan 09 '12 at 20:21
  • I'm using this code to invoke a command against a list of 58 servers. When I have -le8, it only appears to run against 35 servers. If I up that to -le20, then I get 53 (I assume connection problems, etc against the others). Any ideas why? – mbourgon Feb 11 '13 at 13:57
  • 9
    Fatal flaw: if there are 8 jobs running already, you get into the else clause and never do a start-job for the $server the foreach had come to – Allanrbo Jun 17 '14 at 19:24
  • I didn't initially realize what @Allanrbo meant. The way this is written, it will not run jobs for all items in the loop. The conditional wait statement needs to be moved before the Start-Job. – jmathew Jun 20 '16 at 15:13
  • `$running = @(Get-Job | Where-Object { $_.State -eq 'Running' })` can be made shorter `$running = Get-Job -State Failed` – hkarask Jan 26 '17 at 09:22
  • @Allanrbo I agree with you. It can be solved by checking if we are in the limit of jobs and if yes we will wait: `$running = @(Get-Job | Where-Object { $_.State -eq 'Running' }) if ($running.Count -ge 8) { $running | Wait-Job } Start-Job {...}` – E235 Sep 18 '17 at 13:02
  • If somebody asking what the @ is needed for in the code just like me, then the answer is that it makes always result be an array. – Velda Dec 12 '19 at 00:06
  • This solution have a bug. When $running.Count = 9. then it go into 'else' block and waiting... after waiting the foreach step next element. So it not execute job for current element – Loc Le Apr 01 '21 at 03:11
10

It should be really easy with the Split-Pipeline cmdlet of the SplitPipeline module. The code will look as simple as this:

Import-Module SplitPipeline
$list = invoke-sqlcmd 'exec getOneMillionRows' -Server...
$list | Split-Pipeline -Count 8 {process{
    GetData $_ > $_.txt
    ZipTheFile $_.txt $_.txt.zip
    ...
}}
Roman Kuzmin
  • 40,627
  • 11
  • 95
  • 117
  • I like this module a lot. But the only external variable available inside the pipeline block is $_; how do you pass other variables into the pipeline block? `$a = "foo"; $list | Split-Pipeline {# $a is undefined in here}` – Myrddin Emrys Jul 02 '13 at 15:29
  • 1
    Used variables and functions from the current runspace have to be explicitly imported to parallel pipelines using the parameters `-Variable` and `-Function`. Eventually, hopefully soon, I will mention this in the cmdlet help or provide an example. – Roman Kuzmin Jul 02 '13 at 16:37
  • Thank you Roman. I figured this out after asking the question, and posted the suggestion in GitHub about mentioning this in the documentation. This is a very useful tool, and it sped up the task I was running tremendously. Thank you. – Myrddin Emrys Jul 02 '13 at 21:00
7

Old thread but I think this could help:

$List = C:\List.txt
$Jobs = 8

Foreach ($PC in Get-Content $List)
{
Do
    {
    $Job = (Get-Job -State Running | measure).count
    } Until ($Job -le $Jobs)

Start-Job -Name $PC -ScriptBlock { "Your command here $Using:PC" }
Get-Job -State Completed | Remove-Job
}

Wait-Job -State Running
Get-Job -State Completed | Remove-Job
Get-Job

The "Do" loop pause the "foreach" when the amount of job "running" exceed the amount of "$jobs" that is allowed to run. Than wait for the remaining to complete and show failed jobs...

GTPowa
  • 71
  • 1
  • 1
  • 2
    I found this approach to be the best listed. With one adjustment; Do{ $Job = (Get-Job -State Running | measure).count } Until (($Job -le 4) -or (Wait-Job -State Running -Any)) Adding the -or (Wait-Job -State Running -Any)) was more efficient than the busy loop method. – Michael Brown Jun 05 '20 at 18:35
5

Background jobs is the answer. You can also throttle the jobs in the run queue using [System.Collection.Queue]. There is a blog post from PowerShell team on this topic: https://devblogs.microsoft.com/powershell/scaling-and-queuing-powershell-background-jobs/

Using queuing method is probably the best answer to throttling background jobs.

ravikanth
  • 24,922
  • 4
  • 60
  • 60
4

I use and improove a multithread Function, you can use it like :

$Script = {
    param($Computername)
    get-process -Computername $Computername
}

@('Srv1','Srv2') | Run-Parallel -ScriptBlock $Script

include this code in your script

function Run-Parallel {
    <#
        .Synopsis
            This is a quick and open-ended script multi-threader searcher
            http://www.get-blog.com/?p=189#comment-28834
            Improove by Alban LOPEZ 2016

        .Description
            This script will allow any general, external script to be multithreaded by providing a single
            argument to that script and opening it in a seperate thread.  It works as a filter in the
            pipeline, or as a standalone script.  It will read the argument either from the pipeline
            or from a filename provided.  It will send the results of the child script down the pipeline,
            so it is best to use a script that returns some sort of object.

        .PARAMETER ScriptBlock
            This is where you provide the PowerShell ScriptBlock that you want to multithread.

        .PARAMETER ItemObj
            The ItemObj represents the arguments that are provided to the child script.  This is an open ended
            argument and can take a single object from the pipeline, an array, a collection, or a file name.  The
            multithreading script does it's best to find out which you have provided and handle it as such.
            If you would like to provide a file, then the file is read with one object on each line and will
            be provided as is to the script you are running as a string.  If this is not desired, then use an array.

        .PARAMETER InputParam
            This allows you to specify the parameter for which your input objects are to be evaluated.  As an example,
            if you were to provide a computer name to the Get-Process cmdlet as just an argument, it would attempt to
            find all processes where the name was the provided computername and fail.  You need to specify that the
            parameter that you are providing is the "ComputerName".

        .PARAMETER AddParam
            This allows you to specify additional parameters to the running command.  For instance, if you are trying
            to find the status of the "BITS" service on all servers in your list, you will need to specify the "Name"
            parameter.  This command takes a hash pair formatted as follows:

            @{"key" = "Value"}
            @{"key1" = "Value"; "key2" = 321; "key3" = 1..9}

        .PARAMETER AddSwitch
            This allows you to add additional switches to the command you are running.  For instance, you may want
            to include "RequiredServices" to the "Get-Service" cmdlet.  This parameter will take a single string, or
            an aray of strings as follows:

            "RequiredServices"
            @("RequiredServices", "DependentServices")

        .PARAMETER MaxThreads
            This is the maximum number of threads to run at any given time.  If ressources are too congested try lowering
            this number.  The default value is 20.

        .PARAMETER SleepTimer_ms
            This is the time between cycles of the child process detection cycle.  The default value is 200ms.  If CPU
            utilization is high then you can consider increasing this delay.  If the child script takes a long time to
            run, then you might increase this value to around 1000 (or 1 second in the detection cycle).

        .PARAMETER TimeOutGlobal
            this is the TimeOut in second for listen the last thread, after this timeOut All thread are closed, only each other are returned

        .PARAMETER TimeOutThread
            this is the TimeOut in second for each thread, the thread are aborted at this time

        .PARAMETER PSModules
            List of PSModule name to include for use in ScriptBlock

        .PARAMETER PSSapins
            List of PSSapin name to include for use in ScriptBlock

        .EXAMPLE
            1..20 | Run-Parallel -ScriptBlock {param($i) Start-Sleep $i; "> $i sec <"} -TimeOutGlobal 15 -TimeOutThread 5
        .EXAMPLE
            Both of these will execute the scriptBlock and provide each of the server names in AllServers.txt
            while providing the results to GridView.  The results will be the output of the child script.

            gc AllServers.txt | Run-Parallel $ScriptBlock_GetTSUsers -MaxThreads $findOut_AD.ActiveDirectory.Servers.count -PSModules 'PSTerminalServices' | out-gridview
    #>
    Param(
        [Parameter(ValueFromPipeline=$true,ValueFromPipelineByPropertyName=$true)]
            $ItemObj,
        [ScriptBlock]$ScriptBlock = $null,
        $InputParam = $Null,
        [HashTable] $AddParam = @{},
        [Array] $AddSwitch = @(),
        $MaxThreads = 20,
        $SleepTimer_ms = 100,
        $TimeOutGlobal = 300,
        $TimeOutThread = 100,
        [string[]]$PSSapins = $null,
        [string[]]$PSModules = $null,
        $Modedebug = $true
    )
    Begin{
        $ISS = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
        ForEach ($Snapin in $PSSapins){
            [void]$ISS.ImportPSSnapIn($Snapin, [ref]$null)
        }
        ForEach ($Module in $PSModules){
            [void]$ISS.ImportPSModule($Module)
        }
        $RunspacePool = [runspacefactory]::CreateRunspacePool(1, $MaxThreads, $ISS, $Host)
        $RunspacePool.CleanupInterval=1000
        $RunspacePool.Open()

        $Jobs = @()
    }
    Process{
        #ForEach ($Object in $ItemObj){
            if ($ItemObj){
                Write-Host $ItemObj -ForegroundColor Yellow
                $PowershellThread = [powershell]::Create().AddScript($ScriptBlock)

                If ($InputParam -ne $Null){
                    $PowershellThread.AddParameter($InputParam, $ItemObj.ToString()) | out-null
                }Else{
                    $PowershellThread.AddArgument($ItemObj.ToString()) | out-null
                }
                ForEach($Key in $AddParam.Keys){
                    $PowershellThread.AddParameter($Key, $AddParam.$key) | out-null
                }
                ForEach($Switch in $AddSwitch){
                    $PowershellThread.AddParameter($Switch) | out-null
                }
                $PowershellThread.RunspacePool = $RunspacePool
                $Handle = $PowershellThread.BeginInvoke()
                $Job =  [pscustomobject][ordered]@{
                    Handle = $Handle
                    Thread = $PowershellThread
                    object = $ItemObj.ToString()
                    Started = Get-Date
                }
                $Jobs += $Job
            }
        #}
    }
    End{
        $GlobalStartTime = Get-Date
        $continue = $true
        While (@($Jobs | Where-Object {$_.Handle -ne $Null}).count -gt 0 -and $continue)  {
            ForEach ($Job in $($Jobs | Where-Object {$_.Handle.IsCompleted -eq $True})){
                $out = $Job.Thread.EndInvoke($Job.Handle)
                $out # return vers la sortie srandard
                #Write-Host $out -ForegroundColor green
                $Job.Thread.Dispose() | Out-Null
                $Job.Thread = $Null
                $Job.Handle = $Null
            }
            foreach ($InProgress in $($Jobs | Where-Object {$_.Handle})) {
                if ($TimeOutGlobal -and (($(Get-Date) - $GlobalStartTime).totalseconds -gt $TimeOutGlobal)){
                    $Continue = $false
                    #Write-Host $InProgress -ForegroundColor magenta
                }
                if (!$Continue -or ($TimeOutThread -and (($(Get-Date) - $InProgress.Started).totalseconds -gt $TimeOutThread))) {
                    $InProgress.thread.Stop() | Out-Null
                    $InProgress.thread.Dispose() | Out-Null
                    $InProgress.Thread = $Null
                    $InProgress.Handle = $Null
                    #Write-Host $InProgress -ForegroundColor red
                }
            }
            Start-Sleep -Milliseconds $SleepTimer_ms
        }
        $RunspacePool.Close() | Out-Null
        $RunspacePool.Dispose() | Out-Null
    }
}
Alban
  • 3,105
  • 5
  • 31
  • 46
0

Old thread, but my contribution to it, is the part where you count the running jobs. Some of the answers above do not work for 0 or 1 running job. A little trick I use is to throw the results in a forced array, and then count it:

[array]$JobCount = Get-job -state Running

$JobCount.Count

Steve Dorr
  • 136
  • 9
0

This is the 2023 answer:

$list = invoke-sqlcmd 'exec getOneMillionRows' -Server...
$list | % -Parallel -ThrottleLimit 8 {
    GetData $_ > $_.txt
    ZipTheFile $_.txt $_.txt.zip
    ...
}

The ForEach-Object cmdlet gained the ability to launch multiple processes in parallel in Powershell 7.0. See https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.core/foreach-object?view=powershell-7.3

N. I.
  • 168
  • 1
  • 7