How to reduce ingestion rate of Kafka Spout and enable Back pressure?

Question

I am using storm-kafka-client 1.1.1 and storm-core 1.1.0.

I have tuned the following params but not able to enable back-pressure and reduce ingestion rate of kafka-spout.

Spout consuming 2000 messages per sec.

Downstream Bolt takes 50 ms to process a message i.e. processes 20 messages per sec.

The lag between spout emitted tuples and bolt executed tuples increasing over time.

**How can I make Spout read say 20 messages per sec and keep its rate of consumption same as Bolt's rate of execution **

   **Topology**

   topology.max.spout.pending= **5** , 
   topology.message.timeout.secs= **600** , 
   topology.executor.send.buffer.size=**64** , 
   topology.executor.receive.buffer.size=**64** , 
   topology.transfer.buffer.size=**64**

   **KafkaSpoutConfig**

   setPollTimeoutMs(**200**) , 
   setFirstPollOffsetStrategy(latest) , 
   setMaxUncommittedOffsets(**2_000_000**) , 
   setGroupId(groupName) , 
   setProp("fetch.max.wait.ms",**1000**) , 
   setProp("max.poll.records", **100**) , 
   setMaxPartitionFectchBytes(**512**)  , 
   setProp("send.buffer.bytes", **512**) , 
   setProp("receive.buffer.bytes", **512**) , 
   setPartitionRefreshPeriodMs(30_000).setProp("enable.auto.commit", "true") , 
   setProp("session.timeout.ms", "**60000**") , 

   KafkaSpoutRetryExponentialBackoff.TimeInterval.microSeconds(**50**) ,
   KafkaSpoutRetryExponentialBackoff.TimeInterval.milliSeconds(**5**) , 1 ,
   KafkaSpoutRetryExponentialBackoff.TimeInterval.seconds(**1**) ) ;

I am not sure what values should be set for TOPOLOGY_SPOUT_WAIT_STRATEGY and BACKPRESSURE_DISRUPTOR_HIGH_WATERMARK

So what combination of above params and values can help control the spout ingestion rate ?

Any suggestion will be highly appreciated.

Thanks Kaniska

score 3 · Answer 1 · answered Feb 23 '18 at 14:10

TOPOLOGY_SPOUT_WAIT_STRATEGY is only used when the spout is asked to emit a new tuple, and it doesn't emit anything (i.e. if there were no new messages). It shouldn't have any effect on backpressure.

I'm not too familiar with the current backpressure implementation, but I'm pretty sure you need to explicitly enable it with TOPOLOGY_BACKPRESSURE_ENABLE.

BACKPRESSURE_DISRUPTOR_HIGH_WATERMARK is a ratio, so if you set it to e.g. 0.9 it will throttle the spout when the bolt's input queue is 90% full. You can find the documentation for settings in https://github.com/apache/storm/blob/1.1.x-branch/storm-core/src/jvm/org/apache/storm/Config.java, and the default values at https://github.com/apache/storm/blob/1.1.x-branch/conf/defaults.yaml

In order to avoid too many emitted tuples at a time, I think you should just set topology.max.spout.pending to some reasonable number of tuples (maybe a few hundred?). Make sure your topology is set to enable acking (i.e. set topology.enable.message.timeouts to true). Otherwise max spout pending has no effect.

Not sure why you're changing the executor buffer sizes.

You should also consider upgrading Storm and storm-kafka-client to at least 1.1.2. There have been a lot of fixes to storm-kafka-client recently, and you might have an easier time with it if you upgrade.

I'm not sure what the stars in your code mean?

Thanks a lot for your explanation. Will set the flags you mentioned. I changed the 'executor buffer sizes' after reading the following blog post ~ 'http://jobs.one2team.com/apache-storms/' ~ "The reason why the back pressure didn’t work out of the box with the default parameters is that we have a bolt that has a pretty long processing time, and usually takes 0.1 second to process a single message .. the spout was fast enough to fill the buffer of these slow bolts... *The main parameter we had to tune was the buffer size* " — kaniska Mandal, Feb 23 '18 at 18:31

How to reduce ingestion rate of Kafka Spout and enable Back pressure?

1 Answers1