I am working with a large data set, which I use to make certain calculations. Since it is a huge data set, my machine, I am working on, is doing the job excessively long, for this reason I decided to use the future package in order to distribute the work between several machines and speed up the calculations. So, my problem is that through the future (using putty & ssh) I can connect to those machines (in parallel), but the work itself is doing the main one, without any distribution. Maybe you can advice some solution:
- How to make it work in all machines;
- As well, how to check if the process is working (I mean some function or anything that could help to verify the functionment functionality of those, ofc if it's existing).
My code:
library(future)
workers <- c("000.000.0.000", "111.111.1.111")
plan(remote, envir = parent.frame(), workers= workers, myip = "222.222.2.22")
start <- proc.time()
cl <- makeClusterPSOCK(
c("000.000.0.000", "111.111.1.111"), user = "...",
rshcmd = c("plink", "-ssh", "-pw", "..."),
rshopts = c("-i", "V:\\vbulavina\\privatekey.ppk"),
homogeneous = FALSE))
setwd("V:/vbulavina/r/inversion")
a <- source("fun.r")
f <- future({source("pasos.r")})
l <- future({source("pasos2.R")})
time_elapsed_parallel <- proc.time() - start
time_elapsed_parallel
f and l objects are supposed to be done in parallel, but the master machine is doing all the job, so I'm a bit confused if i can do something concerning it.
PS: I tried plan()
with remote, multiprocess, multisession, cluster
and nothing.
PS2: my local machine is Windows and try to connect to Kubuntu and Debian (firewall is off in all of those).
Thnx in advance.