1

How do I make storm-nimbus to restart worker on the same machine?

To test the fault tolerance, I do a kill -9 on a worker process expecting the worker to be restarted on the same machine, but on one of the machines, nimbus launches the worker on another machine!!! Nimbus log does not show several tries or anything unusual or errors!

Would appreciate any help, Thanks!

Behzad Pirvali
  • 764
  • 3
  • 10
  • 28

1 Answers1

1

You shouldn't need to. Workers should be able to switch to an open slot on any supervisor. If you have a bolt that doesn't accomodate this because it is reading data on a particular supervisor, this is a design problem.

Additionally, Storm's fault tolerance is intended to handle not only worker failures, but also supervisor failures, in which case you won't be able to restart a worker on the same supervisor. You shouldn't need to worry where a worker is: that's a feature of Storm.

Gordon Seidoh Worley
  • 7,839
  • 6
  • 45
  • 82
  • 1
    Well, I have to disagree as you should always be worried about CPU,I/O, and memory utilization if you like to meaningfully utilize your underlying hardware. Yes, I agree that having to read a file would be a design problem. This is not the case here. Also, from a performance perspective an evenly utilized farm would give you the best performance. And I do see it as a design flaw in Storm if Storm does not care about utilizing a farm optimally. – Behzad Pirvali Nov 17 '13 at 18:07
  • could you please share what do you mean by `Storm does not care about utilizing a farm optimally.` . just trying to understand – user2720864 Nov 18 '13 at 07:53
  • When you create a Topology and set the no of workers, in Storm terminology, you will provide Storm with an initial "Parallel-Hint". Instead of saving this hint to use it in case of worker restart, Storm seems to be simply forgetting about it. Forgetting is ONLY OK if Storm could do a hardware analysis of the farm to figure out: the no of cores, memory, I/O throughput, and network bandwidth of the farm. Otherwise, how can you be sure to utilize the farm evenly. This knowledge either has to come from Programmer, who creates the topology. In this case, Storm should honor it as much as possible! – Behzad Pirvali Nov 18 '13 at 08:04
  • Or storm should at least be aware of number CPU cores in the farm and try to make sure that the farm is utilized evenly! – Behzad Pirvali Nov 18 '13 at 08:09
  • sorry may be I am missing something but not able to relate to your problem as `parallelism hint` in storm used to specifies the initial number of executors in the topology. ` Instead of saving this hint to use it in case of worker restart` .. How would you expect storm to use this in case of worker failure – user2720864 Nov 18 '13 at 10:25
  • For example when you launch a topology on a 2-nodes cluster and you set the no of workers to 4, storm knows to launch 2 workers on each node. This is the information that should be persisted for feature use. So, if a worker dies, it knows that the machine with only one worker left, is the machine to restart a new worker on. Instead, storm launches a worker randomly on any machine with an available slot. – Behzad Pirvali Nov 19 '13 at 01:57
  • `..on a 2-nodes cluster and you set the no of workers to 4, storm knows to launch 2 workers on each node` ... this is not always the case. Storm tries to distribute the workers as evenly as possible. However you can try storm re-balancing (configure workers or process and/or executors with out restarting the cluster) – user2720864 Nov 19 '13 at 08:10