1

I have a requirement to check the availability of 1,000 different urls, from a given text file, via a single windows server 2016 virtual machine, with PowerShell v5.1 installed.

The required check should be in interval of every 5 minutes.

My first assumption was to use PowerShell cmdlets: Get-Content with Invoke-WebRequest in a For-Each loop:

$urlList = Get-Content -Path "c:\URLsList.txt"

foreach ($url in $urlList) {

    $result = Invoke-WebRequest $url
    $result.StatusCode
}

But given the number of URLs (1,000), i'm not sure is PowerShell Invoke-WebRequest is scalable enough for this task.

I didn't see any mentioning of best practice or any limitation in the official documentation: https://learn.microsoft.com/en-us/powershell/module/microsoft.powershell.utility/invoke-webrequest?view=powershell-5.1

But while searching, I have learned about PowerShell jobs.

But due to the fact, That the 1,000 URLs, should be checked every 5 minutes.

I'm not sure if it will be relevant.

edwio
  • 198
  • 3
  • 20
  • 1
    So... did you write any code to test whether it's feasible or not? – Mathias R. Jessen May 15 '23 at 10:56
  • @MathiasR.Jessen, I'm in the design phase of the algorithm which will be used in the end solution. – edwio May 15 '23 at 11:06
  • @MathiasR.Jessen, added a sample snippet for the general Idea. But still the question remains, is PowerShell is scalable for handling 1,000 web requests every 5 minutes. Or is it a limition of the virtual server, running that actual script. – edwio May 15 '23 at 11:21
  • use ForEach -Parallel https://learn.microsoft.com/en-us/powershell/module/psworkflow/about/about_foreach-parallel?view=powershell-5.1 – rinat gadeev May 15 '23 at 11:26
  • @rinatgadeev, thanks, but I wrote that I'm using PowerShell v5.1. – edwio May 15 '23 at 11:31
  • @edwio Alright, good start! Does the code finish running in 5 minutes or not? – Mathias R. Jessen May 15 '23 at 11:43
  • @MathiasR.Jessen, not good at all. Using Measure-Command, the script total run time is 6 Minutes. I'm not sure if the root cause is related to the virtual server current resource (8 Cores CPU, 16 GB Ram) or maybe the the fact some URL's are not accessible, and due to that fact Invoke-WebRequest has no timeout parameter the could be set, for each HTTP Get request, it's performs. – edwio May 15 '23 at 11:52
  • `Invoke-WebRequest` in PowerShell 5.1 absolutely has a timeout parameter: `Invoke-WebRequest ... -TimeoutSec 2`. That being said, you'll probably want to batch the URLs 10 at a time (.NET FX has in internal threadpool for http client requests limiting you to 10 concurrent requests) as background jobs – Mathias R. Jessen May 15 '23 at 12:04
  • @MathiasR.Jessen, the timeout is on the level of the request? And reagrding the batching, is PowerShell Jobs should be use? If so, how it will calculate the list of URLs, so that in total all URLs will run every 5 Minutes? – edwio May 15 '23 at 12:11
  • 1
    You will get improvements if you run the task in parallel. I would limit the number of parallel tasks to the number of cores in your microprocessor. I would take the list of machines and split into equal parts and the use foreach with parallel option to run the code : https://learn.microsoft.com/en-us/powershell/module/psworkflow/about/about_foreach-parallel?force_isolation=true&view=powershell-5.1 – jdweng May 15 '23 at 15:19
  • 1
    @jdweng, limiting the parallelism to the number of CPU cores for _network_-bound tasks is ill-advised. What you're linking to refers to the obsolescent Windows PowerShell _workflow_ technology, which is different from regular PowerShell code and nowadays best avoided; it is fundamentally no longer available in PowerShell (Core) 7+, which now offers a `-Parallel` parameter as part of the regular [`ForEach-Object`](https://learn.microsoft.com/en-us/powershell/module/Microsoft.PowerShell.Core/ForEach-Object) cmdlet. – mklement0 May 15 '23 at 17:11

1 Answers1

1

An inherent limitation of Invoke-WebRequest (and Invoke-RestMethod) is being able to act on just one URL at a time.

Targeting multiple URLs in parallel requires command-external parallelism, such as via (slow and resource-intensive) PowerShell jobs or (lightweight and therefore preferable) thread jobs - via the ThreadJob module that can be installed on demand in Windows PowerShell and comes with PowerShell (Core) 7+, and, most efficiently, with the equally thread-based ForEach-Object -Parallel feature in PowerShell 7+.

However, command-external parallelism, even in the thread-based form, invariably entails nontrivial overhead.


Therefore, consider using curl.exe, which ships with recent Windows versions and has built-in support for targeting multiple URLs as well as doing so in parallel.


Performance comparison based on sample code that performs GET requests with 10 URLs and reports the responses' HTTP status codes, using a variety of sequential and parallel approaches.

  • Absolute timings will vary, even between runs, but the ratio should provide a sense of what performs best.

  • The ranking may be different on Unix-like platforms. Curiously, on an M1 Mac I see the operations being slower overall, and curl even being slower than the comparable Invoke-WebRequest approaches.

  • The source code is below; you can easily tweak it to provide more URLs and experiment with the degree of parallelism.

Sample results from Windows PowerShell (values in seconds, fastest first):

Method                                       Duration
------                                       --------
curl, parallel                              0.2868489
Invoke-WebRequest, Start-ThreadJob          0.5779788
curl, sequential                            1.9407611
Invoke-WebRequest, sequential               2.3540807
Invoke-WebRequest, ForEach-Object -Parallel       N/A

Sample results from PowerShell (Core) 7.3.4 (on Windows):

Method                                      Duration
------                                      --------
curl, parallel                                  0.27
Invoke-WebRequest, ForEach-Object -Parallel     0.42
Invoke-WebRequest, Start-ThreadJob              0.52
curl, sequential                                1.89
Invoke-WebRequest, sequential                   2.05

Source code:

# Sample URLs
$urls = @(
  'http://www.example.org'
  'http://www.example.com'
  'https://en.wikipedia.org'
  'https://de.wikipedia.org'
  'https://fr.wikipedia.org'
  'https://it.wikipedia.org'
  'https://es.wikipedia.org'
  'https://ru.wikipedia.org'
  'https://ru.wikipedia.org'
  'https://als.wikipedia.org'
)

# Code that implements various approaches.
$scriptBlock = {
  param(
    [switch] $UseCurl,
    [switch] $Parallel,
    [switch] $UseThreadJobs
  )
    
  if ($useCurl) {
    # use curl.exe
    $curlExe = if ($IsCoreCLR) { 'curl' } else { 'curl.exe' }
    $urlArgs = foreach ($url in $urls) { $url, '-o', '/dev/null' }
    $parallelArgs = @()
    if ($Parallel) { $parallelArgs = '--parallel', '--parallel-max', $numParallelTransfers }
    & $curlExe -s -w '%{url} = %{http_code}\n' $parallelArgs -L $urlArgs
  }
  else {
    # use Invoke-WebRequest
    $ProgressPreference = 'SilentlyContinue'
    if ($Parallel) {
      if ($UseThreadJobs) {
        $urls | ForEach-Object {
          Start-ThreadJob -ThrottleLimit $numThreads { "$using:_ = " + (Invoke-WebRequest $using:_).StatusCode } 
        } | Receive-Job -Wait -AutoRemoveJob
      }
      else { # ForEach-Object -Parallel
        $urls | ForEach-Object -ThrottleLimit $numThreads -Parallel { "$_ = " + (Invoke-WebRequest $_).StatusCode }
      }
    }
    else { # sequential
      $urls | ForEach-Object {  "$_ = " + (Invoke-WebRequest $_).StatusCode }
    }
  }
} 

# Set the desired number of parallel threads / transfers:
$numThreads = 10 # for ForEach-Object -Parallel, whose default is 5
$numParallelTransfers = 50 # For curl.exe: 50 is the default, and lowering it hurts performance

# Run benchmarks
@(
  [pscustomobject] @{
    Method   = 'Invoke-WebRequest, sequential'
    Duration = (Measure-Command { Write-Verbose -Verbose 'Invoke-WebRequest sequential solution:'; & $scriptBlock | Out-Host }).TotalSeconds
  }

  [pscustomobject] @{
    Method   = 'Invoke-WebRequest, ForEach-Object -Parallel'
    Duration =
      if ($PSVersionTable.PSVersion.Major -lt 7) { 'N/A' }
      else { (Measure-Command { Write-Verbose -Verbose 'Invoke-WebRequest with ForEach-Object -Parallel:'; & $scriptBlock -Parallel | Out-Host }).TotalSeconds }
  }

  [pscustomobject] @{
    Method   = 'Invoke-WebRequest, Start-ThreadJob'
    Duration =
      if (-not (Get-Command -ErrorAction Ignore Start-ThreadJob)) { 'N/A' }
      else { (Measure-Command { Write-Verbose -Verbose 'Invoke-WebRequest with Start-ThreadJob:'; & $scriptBlock -Parallel -UseThreadJobs | Out-Host }).TotalSeconds }
  }

  [pscustomobject] @{
    Method   = 'curl, sequential'
    Duration = (Measure-Command { Write-Verbose -Verbose 'curl.exe sequential solution:'; & $scriptBlock -UseCurl | Out-Host }).TotalSeconds
  }

  [pscustomobject] @{
    Method   = 'curl, parallel'
    Duration = (Measure-Command { Write-Verbose -Verbose 'curl.exe parallel solution:'; & $scriptBlock -UseCurl -Parallel | Out-Host }).TotalSeconds
  } 

) | 
  ForEach-Object -Begin {
    Write-Verbose -Verbose "Timing in seconds for $($urls.Count) URLs, based on $numThreads simultaneous threads running Invoke-WebRequest / up to $numParallelTransfers parallel curl.exe transfers:"
  } -Process {
    $_
  } |
  Sort-Object Duration
mklement0
  • 382,024
  • 64
  • 607
  • 775