3

Overview

I am writing a program (in R) that makes API calls at certain designated times. The API calls take a while, but I need the timer (main loop) to continue counting while the API call is made. To do so, I need to "outsource" the API call to another CPU thread. I believe this is possible and have looked into the future and promises packages, but haven't found a solution yet.

Reproducible Example

Let's run a for loop that counts from 0 to 100. When the counter (i) gets to 50, it has to complete a resource-intensive process (calling the function sampler, which samples 1 million normal distributions 10,000 times for the sake of taking up computation space). The desire is for the counter to continue counting while sampler() is doing its work on another thread.

#Something to take up computation space
sampler <- function(){
  for(s in 1:10000) sample(1000000)
}

#Get this counter to continue while sampler() runs on another thread
for(i in 1:100){
  message(i)
  if(i == 50){
    sampler()
  }
}

What I have tried (unsuccessfully)

library(future)

sampler <- function(){
  for(s in 1:10000) sample(1000000)
}

for(i in 1:100){
  message(i)
  if(i == 50){
    mySamples <- future({ sampler() }) %plan% multiprocess
  }
}
Brigadeiro
  • 2,649
  • 13
  • 30
  • 1
    Perhaps RStudio's "jobs" function would help? https://blog.rstudio.com/2019/03/14/rstudio-1-2-jobs/ – Jon Spring Jun 29 '19 at 00:29
  • Looks interesting - thanks for the suggestion. Ideally I would like this to be independent of the IDE. – Brigadeiro Jun 29 '19 at 00:42
  • In my experience with `future`, splitting a task into separate processes (R doesn't do threads) is great, but you don't return immediately to your primary/main REPL while the other processes work. I've discussed this behavior with its author ([future#293](https://github.com/HenrikBengtsson/future/issues/293)), and it is not in the immediate plan to enable this functionality. (I don't know if/how `promises` can be brought to bear here.) – r2evans Jun 29 '19 at 05:12
  • 1
    What should happen with the result of the async process? Does it have to be included somehow in the loop? – SeGa Jul 01 '19 at 11:07
  • 2
    Regarding @r2evans commend: the `future()` function is indeed _non-blocking_, that is, it returns immediately here _as long as there is another worker available_. What is discussed in https://github.com/HenrikBengtsson/future/issues/293 is a feature request of having futures also not block when all workers are already occupied. – HenrikB Jul 01 '19 at 14:18
  • 2
    Thanks for piping in, Henrik, I think I mis-stated (even mis-understood) the underlying mechanisms a little. – r2evans Jul 01 '19 at 15:02

1 Answers1

3

It seems to me your call is only blocking while the workers are created, but not for the duration of the actual work. E.g. if do the plan() first, the counter will not block:

library(future)

sampler <- function(){
  for(s in 1:10000) sample(1000000)
}

plan(multiprocess)

for(i in 1:100){
  message(i)
  if(i == 50){
    mySamples <- future({ sampler() })
  }
}

Also note, that the runtime of sampler() is much longer than the duration of the blocking call in your code and that, after executing your code, mySamples still has the status resolved: FALSE and CPU usage is still high.

AEF
  • 5,408
  • 1
  • 16
  • 30