0

We are working on a Powershell script that, among other things, performs a job import of multiple computers via a REST API. The normal job import also works flawlessly and gets an XML with all necessary information passed as parameter.

Now we want to parallelize this job import, so that several of these imports can take place at the same time to reduce the time of the import with a high number of computers.

For this purpose, we use a runspace pool and pass a worker - which contains the code for the job import - as well as all necessary parameters to the respective Powershell instance. Unfortunately, this doesn't seem to work, since even after measuring the import time, we couldn't see any speedup due to the parallelization of the job import. The measured time is always about the same as if we would perform the job import sequentially - i.e. without parallelization.

Here is the relevant code snippet:

function changeApplicationSequenceFromComputer {
param (
    [Parameter(Mandatory=$True )]
    [string]$tenant = $(throw "Parameter tenant is missing"),
    [Parameter(Mandatory=$True)]
    [string]$newSequenceName = $(throw "Parameter newSequenceName is missing")

)  

    # Other things before parallelization


    # Passing all local functions and imported modules in runspace pool to call it from worker
    $InitialSessionState = [initialsessionstate]::CreateDefault()
    Get-ChildItem function:/ | Where-Object Source -like "" | ForEach-Object {
    $functionDefinition = Get-Content "Function:\$($_.Name)"
    $sessionStateFunction = New-Object System.Management.Automation.Runspaces.SessionStateFunctionEntry -ArgumentList $_.Name, $functionDefinition 
    $InitialSessionState.Commands.Add($sessionStateFunction)
}
    # Using a synchronized Hashtable to pass necessary global variables for logging purpose
    $Configuration = [hashtable]::Synchronized(@{})
    $Configuration.ScriptPath = $global:ScriptPath
    $Configuration.LogPath = $global:LogPath
    $Configuration.LogFileName = $global:LogFileName
    
    $InitialSessionState.ImportPSModule(@("$global:ScriptPath\lib\MigrationFuncLib.psm1"))

    # Worker for parallelized job-import in for-each loop below
    $Worker = {
        param($currentComputerObjectTenant, $currentComputerObjectDisplayName, $newSequenceName, $Credentials, $Configuration)
        $global:ScriptPath = $Configuration.ScriptPath
        $global:LogPath = $Configuration.LogPath
        $global:LogFileName = $Configuration.LogFileName
        try { 
            # Function handleComputerSoftwareSequencesXml creates the xml that has to be uploaded for each computer
            # We already tried to create the xml outside of the worker and pass it as an argument, so that the worker just imports it. Same result.
            $importXml = handleComputerSoftwareSequencesXml -tenant $currentComputerObjectTenant -computerName $currentComputerObjectDisplayName -newSequence $newSequenceName -Credentials $Credentials
            $Result =  job-import $importXml -Server localhost -Credentials $Credentials 
            # sleep 1 just for testing purpose
            Log "Result from Worker: $Result"
        } catch {
            $Result = $_.Exception.Message
        }
    } 

    # Preparatory work for parallelization
    $cred = $Credentials
    $MaxRunspacesProcessors = ($env:NUMBER_OF_PROCESSORS) * $multiplier # we tried it with just the number of processors as well as with a multiplied version. 
    
    Log "Number of Processors: $MaxRunspacesProcessors"

    $RunspacePool = [runspacefactory]::CreateRunspacePool(1, $MaxRunspacesProcessors, $InitialSessionState, $Host) 
    $RunspacePool.Open()
    
    $Jobs = New-Object System.Collections.ArrayList

    foreach ($computer in $computerWithOldApplicationSequence) {

        # Different things to do before parallelization, i.e. define some variables 

        # Parallelized job-import
        
        Log "Creating or reusing runspace for computer '$currentComputerObjectDisplayName'"
        $PowerShell = [powershell]::Create() 
        $PowerShell.RunspacePool = $RunspacePool
        Log "Before worker"
        $PowerShell.AddScript($Worker).AddArgument($currentComputerObjectTenant).AddArgument($currentComputerObjectDisplayName).AddArgument($newSequenceName).AddArgument($cred).AddArgument($Configuration) | Out-Null
        Log "After worker"

        $JobObj = New-Object -TypeName PSObject -Property @{
            Runspace = $PowerShell.BeginInvoke()
            PowerShell = $PowerShell  
        }

        $Jobs.Add($JobObj) | Out-Null

        # For logging in Worker 
        $JobIndex = $Jobs.IndexOf($JobObj)
        Log "$($Jobs[$JobIndex].PowerShell.EndInvoke($Jobs[$JobIndex].Runspace))"

}
        <#
        while ($Jobs.Runspace.IsCompleted -contains $false) {
        Log "Still running..."
        Start-Sleep 1
        }
        #>
        # Closing/Disposing pool

} # End of the function

The rest of the script looks like this (simplified):

# Parameter passed when calling the script
param (
    [Parameter(Mandatory=$True)]
    [string]$newSequenceName = $(throw "Parameter target is missing"),
    [Parameter(Mandatory=$True)]
    [float]$multiplier= $(throw "Parameter multiplier is missing")
)

# 'main' block

$timeToRun = (Measure-Command{
    
    changeApplicationSequenceFromComputer -tenant "testTenant" -newSequenceName $newSequenceName    

}).TotalSeconds


Log "Total time to run with multiplier $($multiplier) is  $timeToRun"

Any ideas why the job import is obviously only executed sequentially despite runspace pool and corresponding parallelization?

Salvatore
  • 99
  • 2
  • 5
  • "since even after measuring the import time, we couldn't see any speedup due to the parallelization of the job import." - what makes you think this is not just 50% overhead from all the runspacepool setup + 50% faster parallel executation? :-) – Mathias R. Jessen May 26 '21 at 09:37
  • 1/2: Hello and thank you for your help. We thought the same way. In the worker we included a sleep 1 to test exactly this case. For example, we took 100 computers and parallelized them on four cores without sleep 1, the result was about 60 seconds for the whole job import. – Salvatore May 27 '21 at 10:14
  • 2/2: Then we used sleep 1 and ran the same process again. The result was then about 160 seconds (60 seconds we had without sleep for job import + 100 seconds for sleep 1 on 100 computers), but we would have expected the time to be shorter here (about 85 seconds: 60 for job import + 100/4 due to parallelization on 100 computers with sleep 1 split across four cores). – Salvatore May 27 '21 at 11:46

1 Answers1

0

We have found the error. The foreach contained the following code block:

        # For logging in Worker 
        $JobIndex = $Jobs.IndexOf($JobObj)
        Log "$($Jobs[$JobIndex].PowerShell.EndInvoke($Jobs[$JobIndex].Runspace))"

This had to be created outside the foreach so that the code looks like this:

    function changeApplicationSequenceFromComputer {
    param (
        [Parameter(Mandatory=$True )]
        [string]$tenant = $(throw "Parameter tenant is missing"),
        [Parameter(Mandatory=$True)]
        [string]$newSequenceName = $(throw "Parameter newSequenceName is missing")
    
    )  
    
    # ... Everything as before
    
    $Jobs.Add($JobObj) | Out-Null
    
    } #end of foreach

$Results = @()
foreach($Job in $Jobs ){   
    $Results += $Job.PowerShell.EndInvoke($Job.Runspace)    
}

So the EndInvoke() has to be called outside the foreach.

Salvatore
  • 99
  • 2
  • 5