2

I am trying to do process across a reasonable data set and was trying the Foreach-Object -Parallel in powershell 7.x.

Every time i ran i was finding that i was running out of memory It seems about 1 gig per 1000 objects and i have now distilled down to the following code. (This dose nothing apart from create and then destroy 4000 runs through doing nothing)

$a = new-object object[] 4000
$a | ForEach-Object -ThrottleLimit 40 -Parallel {
  #Code to do something in here.
}

If you change the value of $a to be 1000 about 1 gig of memory will be consumed, Set $a to 2000 then 2 gig of memory etc.. The throttle limit dosnt change the amount of memory that is consumed just number of CPU threads.

This causes issues when the number of times you want to process your scriptblock gets large. In my example i need to process 20,000 computers with the code that would be in the script block. When running that code the script was consuming 16+ gig and causing paging.

Can anyone please confirm if they are seeing the same issue or if there is a known work around.

Davey
  • 41
  • 3
  • 2
    Yes, this behavior is still present in 7.0.3. The issue seems to be with garbage collector scheduling, running `[gc]::Collect()` after the pipeline returns correctly frees all the underlying memory. As a workaround, you might want to chunk your input into batches of a size with an acceptable level of memory pressure, say 1000 computers at a time - an then manually invoke `[gc]::Collect()` before kicking off the next batch – Mathias R. Jessen Aug 26 '20 at 11:54
  • 1
    Thanks i could not remember the command to kick off the Garbage collector. I guess time to post a github issue for PowerShell team. – Davey Aug 26 '20 at 11:57
  • Is there really a benefit to 40 threads on a 4 or 8 core computer? – js2010 Aug 26 '20 at 13:53
  • The process tends to be Network limited and the CPU threads tend to not be doing much. IT was just an example. If you run the above code with one thread you will get same result. Change the number of threads and it dosnt change the outcome of garbage collector not keeping up. – Davey Aug 27 '20 at 12:19

2 Answers2

2

Thank you to Mathias R. Jessen who confirmed that this is still an issue and a suggestion. Breaking the process down to now processing only a limited number of machines to limit memory usage.


$ObjectsToProcess = (1..4102)
$Batch = 0
$BatchSize = 100
do {
    # Create an arrary with limited number of objects in it for memory management
    $ObjectsProcessing = [System.Collections.Generic.List[string]]::new()
    for ($i = $Batch; (($i -lt $Batch + $BatchSize) -and ($i -lt $ObjectsToProcess.count)); $i++) {
        $ObjectsProcessing.add($ObjectsToProcess[$i])
    }

    $ObjectsProcessing | ForEach-Object -ThrottleLimit 40 -Parallel {
        $_
        # Main script in this block now.
    }
    [gc]::Collect() # garbage collection to recover memory
    
    $Batch = $Batch + $BatchSize
} while ($Batch -lt $ObjectsToProcess.count)

This will allow processing of up to $BatchSize machines at once with the number of simultaneous machines been controlled by ThrottleLimit. Memory consumption can be controlled by $BatchSize.

Memory to now process 4102 objects now never goes above 200 Meg vs 4 Gig before.

Davey
  • 41
  • 3
0

I had the same issue. The suggestion of using garbage collection between the forEach parallel and the end of the do loop helped but upon each do loop iteration the amount of memory continued to increase. My solution was to move all of the code I had inside the forEach parallel into a function outside and above the forEach parallel code section then call the function from inside the forEach parallel code.

Put all forEach parallel code in a function outside and above the forEach parallel code section.

Place this line of code between function and ForEach-Object -Parallel code.

$YourFunctionNameStg = $function:YourFunctionName.ToString()

Place a line of code just inside the ForEach-Object -Parallel code.

$function:YourFunctionNameInside = $using:YourFunctionNameStg

Inside the ForEach-Object -Parallel code call your function.

YourFunctionNameInside

Mike T
  • 93
  • 1
  • 2
  • 13