2

I have created a small unreliable topology consisting of one spout that reads lines from a file containing among others lat/long coordinateds and one of the downstream bolts calls an external reverse geocoding service to determine the country. Because this specific bolt processes tuples in a very slow rate after a while the whole topology halts (does not produce output).

(1) I would like to know what happens when a bolt cannot handle the incoming rate of tuples. As far as I understand storm is push based meaning that the spout emits tuples continuously on a loop and they get stored in the downstream send and receive buffers / queues of each worker / executor. What happens when these buffers / queues get filled up entirely? Does the spout stop emitting new tuples? Has this implementation changed due to the transition from 0mq to netty transport layer?

(2) It has been mentioned that the only way to do flow control in storm in a an unreliable topology is by using the acking system and max spout pending parameter emitting tuples with an id on the spout without doing anything in the ack/fail methods. This is because of some limitations that existed in 0mq transport layer. Now that storm > 0.9 uses the netty transport layer is there any other way to do flow control in an unreliable topology?

Thank you in advance

Matthias J. Sax
  • 59,682
  • 7
  • 117
  • 137

0 Answers0