Questions tagged [cluster-computing]

A computer cluster is a set of connected systems that work together so that in many respects they can be viewed as a single system.

A computer cluster consists of a set of loosely connected computers that work together so that in many respects they can be viewed as a single system. Cluster management is centralized as opposed to a grid's non-central approach. (wikipedia).

5527 questions
12
votes
3 answers

Is Apache Spark good for lots of small, fast computations and a few big, non-interactive ones?

I'm evaluating Apache Spark to see if it's good platform for the following requirements: Cloud computing environment. Commodity hardware. Distributed DB (e.g. HBase) with possibly a few petabytes of data. Lots of simultaneous small computations…
Jan Żankowski
  • 8,690
  • 7
  • 38
  • 52
12
votes
3 answers

Remove markers from markerClusterer

I am trying to create an interactive map with cluster that need to be displayed when user checks a box and removed when the box is unchecked again. So far everything is working well, the cluster work and everything, but I have noticed a strange…
Zoé de Moffarts
  • 165
  • 1
  • 2
  • 9
12
votes
1 answer

How to list all nodes on SGE cluster?

I am trying to list all nodes on the cluster, but don't know the command. I searched if I use qhost it can list part of nodes. Any idea how to list all nodes?
truelies
  • 145
  • 1
  • 2
  • 9
12
votes
3 answers

Create a cluster of co-workers' Windows 7 PCs for parallel processing in R?

I am running the termstrc yield curve analysis package in R across 10 years of daily bond price data for 5 different countries. This is highly compute intensive, it takes 3200 seconds per country on a standard lapply, and if I use foreach and…
Thomas Browne
  • 23,824
  • 32
  • 78
  • 121
12
votes
2 answers

Cloud virtual machines available for free for open source testing?

Anyone know about places in the cloud where you can create (virtual) machines (like Amazon EC2) to use for your computing tasks - places that offer a number (at least 5-10) of free machines, if they are used for testing open source projects? Im not…
12
votes
1 answer

Difference between pool and cluster

From a purest perspective, they kind of feel like identical concepts. Both manage sets of reosurces/nodes and control their access from or by external components. With a pool, you borrow and return these resources/nodes to and from the pool. With a…
IAmYourFaja
  • 55,468
  • 181
  • 466
  • 756
11
votes
2 answers

How to remove dead node out of the Cassandra cluster?

I have the cassandra cluster of 12 nodes on EC2. Because of some failure we lost one of the node completely.I mean that machine do not exist anymore. So i have created the new EC2 instance with different ip and same token as that of the dead node…
samarth
  • 3,866
  • 7
  • 45
  • 60
11
votes
2 answers

Solving SLURM "sbatch: error: Batch job submission failed: Requested node configuration is not available" error

We have a 4 GPU nodes with 2 36-core CPUs and 200 GB of RAM available at our local cluster. When I'm trying to submit a job with the follwoing configuration: #SBATCH --nodes=1 #SBATCH --ntasks=40 #SBATCH --cpus-per-task=1 #SBATCH…
11
votes
1 answer

Does Kubernetes support persistent volumes shared between multiple nodes in a cluster?

I need to build an application that has many bare-metal nodes joined in a Kubernetes cluster and I need a shared persistent file system between those nodes. The nodes should be able to read-write in this file system simultaneously. Bonus: is there a…
Michael Pacheco
  • 948
  • 1
  • 17
  • 25
11
votes
3 answers

What are the different approaches for Java EE session replication?

I am working on a project that requires really high availability and my team is currently working on upgrading some infra-structure and software for a future release. One of the features we would like to enable is to have session replication across…
Pablo
  • 2,054
  • 8
  • 30
  • 56
11
votes
1 answer

Learning Keras model by using Distributed Tensorflow

I have two GPU installed on two different machines. I want to build a cluster that allows me to learn a Keras model by using the two GPUs together. Keras blog shows two slices of code in Distributed training section and link official Tensorflow…
Alessandro
  • 742
  • 1
  • 10
  • 34
11
votes
3 answers

How to hold up a script until a slurm job (start with srun) is completely finished?

I am running a job array with SLURM, with the following job array script (that I run with sbatch job_array_script.sh [args]: #!/bin/bash #SBATCH ... other options ... #SBATCH --array=0-1000%200 srun ./job_slurm_script.py $1 $2 $3 $4 echo 'open'…
Marses
  • 1,464
  • 3
  • 23
  • 40
11
votes
3 answers

Elasticsearch 5.0.0. cluster node not joining

Ok this shouldn't be this hard, I'm trying to run 2 nodes in an elasticsearch cluster and getting an exception when trying to start node-1(node-2 which is master is already started). Using elasticsearch v 5.0.0 for both instances Exception: failed…
Arslan Mehboob
  • 1,012
  • 1
  • 9
  • 21
11
votes
1 answer

QSUB: Specify output and error files for each task in Job Array

Hopefully this is not a dublicate and also not just a problem of our cluster's configuration... I am submitting a job array to a cluster using qsub with the following command: qsub -q QUEUE -N JOBNAME -t 1:10 -e ${ERRFILE}_$SGE_TASK_ID…
niak
  • 340
  • 3
  • 11
11
votes
3 answers

Apache Spark: "failed to launch org.apache.spark.deploy.worker.Worker" or Master

I have created a Spark cluster on Openstack running on Ubuntu14.04 with 8gb of ram. I created two virtual machines with 3gb each (keeping 2 gb for the parent OS). Further, i create a master and 2 workers from first virtual machine and 3 workers from…
jsingh13
  • 370
  • 1
  • 4
  • 12