2

By default, when a Storm spout or bolt encounters an exception, it restarts the spout or bolt and tries again. Is there any configuration option to make it stop the topology, perhaps after N repeated attempts? (For example, Hadoop tries 4 times before giving up.)

I had a Storm topology run for 77 days with one bolt raising an exception on every tuple. In situations like that, I'd rather it fail so that I notice that something's wrong.

Saurabh
  • 71,488
  • 40
  • 181
  • 244
Jim Pivarski
  • 5,568
  • 2
  • 35
  • 47
  • This sounds kind of strange to me because our topologies do stop if an exception occurs unless we wrap them in a FailedException (in which case it will retry). – bridiver Mar 28 '14 at 04:26
  • I'm not explicitly wrapping them in my bolt code, but it sounds like there's some system-wide setting that is different between our two Storm installations. My system administrator doesn't know about it, but I'm hoping someone on SO does. – Jim Pivarski Mar 28 '14 at 16:03
  • You don't want to stop a topology, a topology is meant to process events in real time. If a tuple yields a fatal error, it should be discarded and potentially logged somewhere but the topology should not be blocked to process the remaining tuples. – Svend Mar 30 '14 at 18:35

2 Answers2

2

There is no option for halting the topology (currently). And honestly, killing the whole topology just because an exception is brute force IMHO.

In your scenario, those exceptions should be handled in the application layer.

Is there any configuration option to make it stop the topology, perhaps after N repeated attempts?

There is no ready solution for that but you can do that and keep track the retried tuples in the Spout. If a threshold is met, then log the tuple or send it to a messaging queue.

I had a Storm topology run for 77 days with one bolt raising an exception on every tuple.

Then maybe there is a bug in your bolt's code?

One strategy is to send failed tuples to a massage queue or an event bus (such as HornetQ, Apache Kafka, Redis) and having a listener so you will be notified immediately about a poisonous tuple.

Chiron
  • 20,081
  • 17
  • 81
  • 133
  • 1
    Yes, there's a bug in my bolt code; that's what I wanted to be informed about. Is it really necessary to set up a message queue for every little analytic you want to run? You're making it sound like a major production--- doesn't anyone do quick one-offs with Storm? In an ad hoc job, if it fails, it should die, noticibly. – Jim Pivarski Mar 28 '14 at 16:02
  • @JimPivarski Calling System.exit() could help in this case assuming you aren't running Storm under supervision (auto-restarting). – Chiron Mar 28 '14 at 16:07
0

As far as I have seen, Storm wont retry a tuple (that caused an Exception by itself). It will by default, just continue to process the next tuple. Same tuple wont be re-tried, unless Spout has a fail method implemented.

Binita Bharati
  • 5,239
  • 1
  • 43
  • 24