Questions tagged [snow]

DO NOT USE FOR SNOW ANIMATION.The R package SNOW (acronym for Simple Network Of Workstations) provides a high-level interface for using a cluster of workstations for parallel computations. Use with the [r] tag.

The package snow (acronym for Simple Network Of Workstations) provides a high-level interface for using a cluster of workstations for parallel computations in R.

snow implements an interface to three different low-level mechanisms for creating a virtual connection between processes:

  • Socket
  • PVM (Parallel Virtual Machine)
  • MPI (Message Passing Interface)

The snowfall package provides a more recent alternative to snow. Functions can be used in sequential or parallel mode.

Resources:

127 questions
4
votes
1 answer

R: making cluster in doParallel / snowfall hangs

I've got two servers on a LAN with fresh installs of Centos 6.4 minimal and R 3.0.1. Both computers have doParallel, snow, and snowfall packages installed. The servers can ssh to each other fine. When I attempt to make clusters in either direction,…
dlv
  • 557
  • 1
  • 6
  • 14
4
votes
2 answers

Results of workers not returned properly - snow - debug

I'm using the snow package in R to execute a function on a SOCK cluster with multiple machines(3) running on Linux OS. I tried to run the code with both parLapply and clusterApply. In case of any error at the worker level, the results of the worker…
4
votes
1 answer

Is there a limit on the number of slaves that R snow can create?

I'm trying to build a snow cluster with around 120 processes on 3 different hosts. These are AMD servers with 48 cores each. After building approx the first 90 slaves I get this error: cl = makeSOCKcluster(c(rep("localhost", 44), rep("host2", 46),…
Robert Kubrick
  • 8,413
  • 13
  • 59
  • 91
4
votes
1 answer

How does tm interface with snow?

The high-performance task view notes that tm can use snow for parallel text mining (High-Performance and Parallel Computing with R). However, I have not found any examples demonstrating how this can be done, although I have found some discussion of…
Timothy P. Jurka
  • 918
  • 1
  • 11
  • 21
3
votes
1 answer

Parallel computation in R for saving data over loops

My efforts in applying parallel on the below simple code to save outputs with Openxlsx over multiple loops is failed. Anyone can help please to convert this code to the parallel mode. This code on real size data (over 50 million observations, takes…
Sean
  • 103
  • 9
3
votes
1 answer

Export different subsets of data.tables to each node in a cluster

I am writing a function that processes several very large data.tables and I want to parallelize this function on a Windows machine. I could do this with the snow package using clusterExport to create a copy of each of the data.tables for each node…
orizon
  • 3,159
  • 3
  • 25
  • 30
3
votes
1 answer

Error in check for remote errors (val): 5 nodes produced an error: object not found

Im trying to do a 10-fold cross validation and estimate the model performance of a joint model by using parallel processing (parLapply). Im trying to find out why I receive the error message: "Error in checkForRemoteErrors(val): five nodes produced…
Oesj
  • 51
  • 1
  • 2
  • 6
3
votes
0 answers

R: set 'Checkpoint' on Worker of Cluster

I use the following code to ... 1. create a parallel cluster 2. source test.R 3. and do some parallel work with functions defined in 'test.R' library(parallel) cl <- makeCluster(4) clusterEvalQ(cl, source("test.R")) ## do some parallel…
Bernd
  • 3,405
  • 3
  • 18
  • 21
3
votes
2 answers

How to increase R processing speed dealing with large raster stacks?

I'm dealing with large raster stacks and I need to re-sample and clip them. I read list of Tiff files and create stack: files <- list.files(path=".", pattern="tif", all.files=FALSE, full.names=TRUE) s <- stack(files) r <- raster("raster.tif") s_re…
Geo-sp
  • 1,704
  • 3
  • 19
  • 42
3
votes
1 answer

How to setup AWS cluster to work with openCPU?

I have two EC2 machines: master and slave. SSH keys are generated for user ubuntu and saved to ~/.ssh/authorized_keys on both machines. Thus I can use the cluster from master node as ubuntu user like this: library(doSNOW) cluster_options <-…
redmode
  • 4,821
  • 1
  • 25
  • 30
3
votes
1 answer

When do I need to use sfExport (R Snowfall package)

I am using snowfall for parallel computing. I am always on only one machine with multiple CPUs (>20 cores). I am processing a large amount of data (>20gb). sfExport() takes very long. When I run my test codes on my laptop and check the CPU usage, it…
kn1g
  • 358
  • 3
  • 16
3
votes
1 answer

Is it necessary to remove the exported variable after Snow ended

Is it necessary to remove the exported variable after the parallel computation of Snow ends? I found the memory of 'rsession' process was not changed too much even if clusterEvalQ was called. I suspect there is some memory problem of my sample code…
YYY
  • 605
  • 3
  • 8
  • 16
2
votes
1 answer

foreach/SNOW/doSNOW verbose output with RTerm, but not RGui

Something magical just happened. I used Rterm (launched with R.exe) instead of RGui or RStudio to run a parallel task using foreach/snow/doSnow. In the command window, I can see the output of the child tasks. This never worked with RGui nor…
Suraj
  • 35,905
  • 47
  • 139
  • 250
2
votes
0 answers

Parallel code terribly slow when inside function, working fine standalone

I am struggling with the parallel package. Part of the problem is that I am quite new to parallel computing and I lack a general understanding of what works and what doesn't (and why). So, apologies if what I am about to ask doesn't make sense from…
Allrounder
  • 31
  • 6
2
votes
0 answers

Attaching the `snow` package in `R` creates a `.Random.seed` in the `.GlobalEnv`

I noticed when creating a PSOCK cluster via parallel that the child processes were by default populated with a .Random.seed.. This confused me because there is nothing in the documentation to indicate that this should be the case. More specifically,…
Mihai
  • 2,807
  • 4
  • 28
  • 53
1
2
3
8 9