I need to download a ton of images in one month.
I've written a script to download small JSON text at a speed of about 200/sec on my personal machine; eventually, I will run my script on a server. (I know that image download will unfortunately be much slower.) The script, shown below, makes asynchronous calls in parallel, which is about three times as fast as making these calls asynchronously but serially.
require(crul)
require(tidyverse)
require(tictoc)
require(furrr)
asyncCalls <- function(i) {
urls_to_call = all_urls[i:min(i + 99, nrow(all_urls))]
cc <- Async$new(urls = urls_to_call) # ready the requests
res <- cc$get() # make the requests
lapply(res, function(z) z$parse("utf-8")) # parse the crul results
}
all_urls <- paste0("http://placehold.it/640x440&text=image", seq(1, 200))
plan(multiprocess) # use multiple cores
tic()
metadata <- unlist(future_map(seq(0, floor(nrow(all_urls)/100))*100, ~ asyncCalls(.x)))
toc()
As one would expect, running these image URLs through asyncCalls()
returns all elements as NA
.
How do I modify the script to allow me to quickly download the images from those URLs? I can't find a file download function in crul
, and I'm not sure how to asynchronously use something like download.file()
. Thanks!