I have been able to run 20 models simultaneously using a r6a.48xlarge
Amazon Web Services
instance (192 vCPUs
, 1536.00 GiB
memory) and this R
code:
setwd('/home/ubuntu/')
library(doParallel)
detectCores()
my.AWS.n.cores <- detectCores()
my.AWS.n.cores <- my.AWS.n.cores - 92
my.AWS.n.cores
registerDoParallel(my.cluster <- makeCluster(my.AWS.n.cores))
folderName <- 'model000222'
files <- list.files(folderName, full.names=TRUE)
start.time <- Sys.time()
foreach(file = files, .errorhandling = "remove") %dopar% {
source(file)
}
stopCluster(my.cluster)
end.time <- Sys.time()
total.time.c <- end.time-start.time
total.time.c
However, the above R
code did not run until I reduced the number of cores
to 100
from 192
with this line:
my.AWS.n.cores <- my.AWS.n.cores - 92
If I tried running the code with all 192 vCPUs
or 187 vCPUs
I got this error message
:
> my.AWS.n.cores <- detectCores()
> my.AWS.n.cores <- my.AWS.n.cores - 5
> my.AWS.n.cores
[1] 187
>
> registerDoParallel(my.cluster <- makeCluster(my.AWS.n.cores))
Error in socketConnection("localhost", port = port, server = TRUE, blocking = TRUE, :
all connections are in use
Calls: registerDoParallel ... makePSOCKcluster -> newPSOCKnode -> socketConnection
I had never seen that error message
and could not locate it with an internet search. Could someone explain this error message
? I do not know why my solution worked or whether a better solution exists. Can I easily determine the maximum number of connections
I can use without getting this error
? I suppose I could run the code incrementing the number of cores from 100 to 187.
I installed R
on this instance
with the lines below in PuTTY
. R
could not be located on the instance
until I used the last line below: apt install r-base-core
.
sudo su
echo "deb http://cran.rstudio.com/bin/linux/ubuntu trusty/"
sudo apt-get update
sudo apt-get install r-base
sudo apt install dos2unix
apt install r-base-core
I used this AMI:
Ubuntu Server 18.04 LTS (HVM), SSD Volume Type
EDIT
Apparently, R
has a hardwired limit of 128 connections
. Apparently, you can increase the number of PSOCK workers
manually if you are willing to rebuild R
from source
but I have not found an answer showing how to do that. Ideally I can find an answer showing how to do that with Ubuntu
and AWS
. See also these previous related questions.
Errors in makeCluster(multicore): cannot open the connection
Is there a limit on the number of slaves that R snow can create?