2

I am building a Java Spring application using Storm 1.1.2 and Kafka 0.11 to be launched in a Docker container.

Everything in my topology works as planned but under a high load from Kafka, the Kafka lag increases more and more over time.

My KafkaSpoutConfig:

 KafkaSpoutConfig<String,String> spoutConf = 
     KafkaSpoutConfig.builder("kafkaContainerName:9092", "myTopic")
     .setProp(ConsumerConfig.GROUP_ID_CONFIG, "myGroup")
     .setProp(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, MyObjectDeserializer.class)
     .build()

Then my topology is as follows

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("stormKafkaSpout", new KafkaSpout<String,String>(spoutConf), 25);

builder.setBolt("routerBolt", new RouterBolt(),25).shuffleGrouping("stormKafkaSpout");

Config conf = new Config();
conf.setNumWorkers(10);
conf.put(Config.STORM_ZOOKEEPER_SERVERS, ImmutableList.of("zookeeper"));
conf.put(Config.STORM_ZOOKEEPER_PORT, 2181);

conf.put(Config.NIMBUS_SEEDS, ImmutableList.of("nimbus"));
conf.put(Config.NIMBUS_THRIFT_PORT, 6627);

System.setProperty("storm.jar", "/opt/storm.jar");

StormSubmitter.submitTopology("topologyId", conf, builder.createTopology());

The RouterBolt (which extends BaseRichBolt) does one very simple switch statement and then uses a local KafkaProducer object to send a new message to another topic. Like I said, everything compiles and the topology runs as expected but under a high load (3000 messages/s), the Kafka lag just piles up equating to low throughput for the topology.

I've tried disabling acking with

conf.setNumAckers(0);

and

conf.put(Config.TOPOLGY_ACKER_EXECUTORS, 0);

but I guess it's not an acking issue.

I've seen on the Storm UI that the RouterBolt has execution latency of 1.2ms and process latency of .03ms under the high load which leads me to believe the Spout is the bottleneck.Also the parallelism hint is 25 because there are 25 partitions of "myTopic". Thanks!

NaptownCSC
  • 71
  • 8

1 Answers1

4

You may be affected by https://issues.apache.org/jira/browse/STORM-3102, which causes the spout to do a pretty expensive call on every emit. Please try upgrading to one of the fixed versions.

Edit: The fix isn't actually released yet. You might still want to try out the fix by building the spout from source using e.g. https://github.com/apache/storm/tree/1.1.x-branch to build a 1.1.4 snapshot.

Stig Rohde Døssing
  • 3,621
  • 2
  • 7
  • 7
  • Thank you for your input! If I understand that issue correctly, it would be resolved by changing my processing guarantee to at most once processing. I just tried changing it on the spout config to see if it had any effect but it doesn't seem to have made one. Would that not fix the blocking issue? Or slow down the spout consuming in some other way? If so I'll try checking out 1.1.4 snapshot – NaptownCSC Oct 09 '18 at 19:47
  • Although this may have some affect I don't believe it's the issue I'm seeing. My kafka spouts are only able to consume ~1600 messages/s and the lag grows exponentially (curiously much faster on some partitions it grows but on others it is low and stays low) – NaptownCSC Oct 09 '18 at 21:13
  • 1
    I think it affects everyone. The expensive call is `kafkaConsumer.committed(tp)` in the first line. – Stig Rohde Døssing Oct 10 '18 at 15:06
  • That was it! This was absolutely crippling the throughput on my topology. Thank you! – NaptownCSC Oct 11 '18 at 14:27
  • This feels like a lifetime ago but IIRC, moving to 1.1.4 fixed this issue. @Stig Rohde Døssing was right about `kafkaConsumer.committed(tp)` being the expensive call – NaptownCSC Aug 12 '21 at 22:13