0

I have a spring cloud dataflow stream deployed in PCF using rabbit as the binder. I have multiple processors in the pipeline. Occasionally I see issues wherein a partitioned consumer does not consume messages from Rabbit until the consumer is restarted. For instance, within my stream, I have a processor that has partitioned input. The processor, foo, has 10 partitions. All partitions consume messages without issues 99% of the time. At rare occasions, one partition is not drained. When the instance listening to the partition is terminated and recreated, all works well again. Is there a mechanism to capture these issues? Will listening to ListenerContainerConsumerFailedEvent help in detecting such issues? Is there a preferred way to recover from such issues?

Sample stream definition is as follows:-

Source | foo | bar | Sink

Deployment Properties:-

app.Source.spring.cloud.stream.bindings.output.producer.partition-key-expression=headers['PARTITION_KEY']
app.Source.spring.cloud.stream.bindings.output.producer.partition-count=10
app.foo.spring.cloud.stream.bindings.input.consumer.partitioned=true
app.foo.spring.cloud.stream.instanceCount=10
deployer.foo.count=10
  • Generally, problems like that are caused by the listener thread being "stuck" somewhere in user code. Next time it happens, take a thread dump to see what the listener container thread is doing. – Gary Russell Oct 26 '22 at 18:02
  • Thank You @GaryRussell for your suggestion. Is there a way to capture these issues and probably restart the instance and/or retry connecting to the queue? – Aravind1986 Oct 27 '22 at 14:55
  • First step is to find the root case; the framework does not provide anything except you could enable micrometer metrics and monitor the message processing count. – Gary Russell Oct 27 '22 at 15:14

0 Answers0