0

I am a novice in Apache Storm.

I am trying to develop a real-time stream processing system using Apache Kafka, Storm and ESPER CEP engine.

For that, I am having one KafkaSpout that will emit streams to Bolts(which has my CEP queries) to filter the stream.

I have already created a topology and I am trying to run it on a local cluster

The problem is that the CEP query running in my bolts require batches of tuples to perform window operations on the streams. And in my topology, KafkaSpout is sending only one tuple at a time to Bolts for processing. So my CEP query is not working as expected.

I am using default KafkaSpout in Storm. Is there any way I can send multiple different tuples at once to the Bolts? Some tuning of configuration can do this or do I need to make my custom KafkaSpout for that?

Please help!!

My topology:

TopologyBuilder builder = new TopologyBuilder();

builder.setSpout("KafkaSpout", new KafkaSpout<>(KafkaSpoutConfig.builder("localhost:" + 9092, "weatherdata").setProp(ConsumerConfig.GROUP_ID_CONFIG, "weather-consumer-group").build()),4);

builder.setBolt("A", new FeatureSelectionBolt(), 2).globalGrouping("KafkaSpout");

builder.setBolt("B", new TrendDetectionBolt(), 2).shuffleGrouping("A")

I am using 2 Bolts and one spout.

My esper Query running in Bolt A is

select first(e), last(e) from weatherEvent.win:length(3) as e

Here I am trying to get the first and last event from the window of length three from the event stream. But I get same first and last event because KafkaSpout is sending only one tuple at a time.

tank
  • 465
  • 8
  • 22

1 Answers1

0

The spout can't do it, but you can use either Storm's windowing support https://storm.apache.org/releases/2.0.0-SNAPSHOT/Windowing.html, or just write an aggregation bolt and put it between the spout and the rest of the topology.

So your topology should be spout -> aggregator -> feature selection -> trend detection.

I'd recommend you try out the built-in windowing support, but if you would rather write your own aggregation, your bolt really just needs to receive some number of tuples (e.g. 3), and emit a new tuple containing all the values.

The aggregator bolt should do something like

private List<Tuple> buffered;

execute(Tuple input) {
  if (buffered.size != 2) {
    buffered.add(input)
    return
  }
  Tuple first = buffered.get(0)
  Tuple second = buffered.get(1)
  Values aggregate = new Values(first.getValues(), second.getValues(), input.getValues())
  List<Tuple> anchors = List.of(first, second, input)
  collector.emit(anchors, aggregate)
  collector.ack(first, second, input)
  buffered.clear()
}

This way you end up with one tuple containing the contents of the 3 input tuples.

Stig Rohde Døssing
  • 3,621
  • 2
  • 7
  • 7
  • I am specifying the windows length and length of sliding window like : builder.setBolt("FeatureSelectionBolt", new FeatureSelectionBolt().withWindow(new BaseWindowedBolt.Count(3), new BaseWindowedBolt.Count(5)), 2) But still I am getting exception java.lang.IllegalArgumentException: Window length is not specified org.apache.storm.topology.WindowedBoltExecutor.validate(WindowedBoltExecutor.java:126) ~[storm-core-1.2.1.jar:1.2.1] org.apache.storm.topology.WindowedBoltExecutor.initWindowManager(WindowedBoltExecutor.java:200) ~[storm-core-1.2.1.jar:1.2.1] What is the problem here? – tank Mar 05 '19 at 12:30
  • Not sure, it looks fine to me. I'm assuming FeatureSelectionBolt extends BaseWindowedBolt? And also that you don't extend BaseWindowedBolt with any bolts you don't call .withWindow on? – Stig Rohde Døssing Mar 05 '19 at 14:17
  • Yes, FeatureSelectionBolt extends BaseWindowedBolt. Also, I don't have any bolts that doesn't extend BaseWindowedBolt and still calls .withWindow(). So not sure what is causing the problem. FYI I have two more bolts (which extends BaseRichBolt and not BaseWindowedBolt) which takes the tuples from FeatureSelection Bolt. Also, I am submitting my topology to local cluster.I hope that is not creating the problem – tank Mar 05 '19 at 16:45
  • I'm not sure what the issue is, but if you're running with a localcluster, you can debug it pretty easily. The exception is coming from https://github.com/apache/storm/blob/v1.2.1/storm-core/src/jvm/org/apache/storm/topology/WindowedBoltExecutor.java#L126 because windowLengthCount is null. This only happens if https://github.com/apache/storm/blob/v1.2.1/storm-core/src/jvm/org/apache/storm/topology/WindowedBoltExecutor.java#L154 is false. – Stig Rohde Døssing Mar 05 '19 at 17:10
  • The config is added to the stormConf map when you call .withWindow here https://github.com/apache/storm/blob/v1.2.1/storm-core/src/jvm/org/apache/storm/topology/base/BaseWindowedBolt.java#L144. Try debugging to see if you can spot why that configuration is not in stormConf. I'll see if I can run our example windowing topology in LocalCluster. – Stig Rohde Døssing Mar 05 '19 at 17:11
  • I am not seeing an issue running https://github.com/apache/storm/blob/v1.2.1/examples/storm-starter/src/jvm/org/apache/storm/starter/SlidingWindowTopology.java. There's probably an issue somewhere in your setup. Could you try pastebinning your topology wiring (the TopologyBuilder stuff)? – Stig Rohde Døssing Mar 05 '19 at 17:18
  • The issue got resolved by removing getComponentConfiguration() which was returning null. I am not sure how that resolved the problem. – tank Mar 16 '19 at 22:46