0

I have one machine on which to deploy Spark, Hadoop, and Tachyon. Are spark operations from hdfs/tachyon going to be faster on one node with all cores/RAM or a number of VM nodes evenly dividing the resources? Ram is < 200GB.

Performance and Scalability of Broadcast in Spark is quite old, but suggests that the increase network traffic could be a strong negative in the all vs VM's problem.

dtolnay
  • 9,621
  • 5
  • 41
  • 62
SpmP
  • 527
  • 1
  • 6
  • 16

1 Answers1

0

Its probably better to have multiple instances of the workers, while their is an increase in network overhead the JVM performance with a really large heap isn't great.

Holden
  • 7,392
  • 1
  • 27
  • 33
  • Thank you. Can you add anything quantatative to this? If I have a total of 64 cores and 192G ram, would 4, 3, or 2 nodes be best? – SpmP May 21 '15 at 19:43