I'm running linear regression on COVID-19 data over all 3000+ US counties and the code is running pretty slow. Are there options to parallelize this?
I've tried furrr::future_map()
but it doesn't really speed up the process that much. CPU usage is around 26% with and without furrr:future_map
and only one process is running.
Example code:
library(furrr)
future::plan(multisession, workers = 6)
# also tried multisession workers = 6, (runtime 8.5 min)
# also tried multicore workers = 6, (runtime 3.5 min)
# also tried multicore w/ default workers, (runtime 5.5 min)
# the other called regression functions look very similar
casesmodel <- function(tbl) {
lm(casesper100k ~ time, data = tbl)
}
uscases_twoweeks <-
casesdeaths %>%
filter(date >= twoweeksago) %>%
filter(!is.na(population)) %>%
filter(population > min_country_population) %>%
mutate(countyid = paste(county, state, sep = ", ")) %>%
arrange(countyid, date) %>%
group_by(countyid) %>%
nest() %>%
mutate(deathmodel = future_map(data, deathsmodel),
casemodel = future_map(data, casesmodel),
absdeathmodel = future_map(data, absdeathsmodel),
abscasemodel = future_map(data, abscasesmodel),
)