As stated in my comment, my approach would be to use Poolboy to handle workers, but it is not possible to just request N workers (N being the number of requested URLs) because this will exceed the process limit quickly and cause the checkout requests to time out. Instead, you need a loop that checks whether workers are available and if so, requests the url asynchronously. If no workers are free, it should sleep for a while and then retry.
For this purpose, Poolboy has the :poolboy.checkout/2
function, the second parameter allows us to specify whether it should block or not. If no workers are available it will return :full
, otherwise you will get back a worker pid.
Example:
def crawl_parallel(urls) do
urls
|> Enum.map(&crawl_task/1)
|> Enum.map(&Task.await/1)
end
defp crawl_task(url) do
case :poolboy.checkout Crawler, false do
:full ->
# No free workers, wait a bit and retry
:timer.sleep 100
crawl_task url
worker_pid ->
# We have a worker, asynchronously crawl the url
Task.async fn ->
Crawler.Worker.crawl worker_pid, url
:poolboy.checkin Crawler, worker_pid
end
end
end