-1

I have a job that has to read from a rabbitmq queue and write it to a data-store. Currently I am using AmqpItemReader for reading messages from the queue.

The data I read is in Json format and all my ItemProcessor does is to serialize the json to a java object.

My single threaded solution's performance is very low. I am only able to consume at a rate of 12msgs per second. I would have around 10 million records to process. So, I tried to change it to a multi-threaded step, still, I was not able to see a significant improvement in the throughput (it was around 50 msgs per sec).

How would I speed up my job. I am starting to doubt the route I'm taking is not right. Any light on this would be appreciated. Thanks in advance.

Edit: Included code/configurations for further clarity on what I'm trying to achieve.

Rabbit server configuration: 3 node cluster on AWS with each having 0.5 Gigs of memory.

Message details: Each payload would be around 1 kilo byte JSON.

I'm running the spring batch job on my development machine (Macintosh).

System configuration:

  Processor Name:   Intel Core i7
  Processor Speed:  2.5 GHz
  Number of Processors: 1
  Total Number of Cores:    4
  L2 Cache (per Core):  256 KB
  L3 Cache: 6 MB
  Memory:   16 GB

My ItemReader:

import java.io.IOException;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.amqp.core.Message;
import org.springframework.amqp.rabbit.core.RabbitTemplate;
import org.springframework.batch.item.amqp.AmqpItemReader;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Component;

@Component
public class RabbitMQItemReader extends AmqpItemReader<Message> {

  private final Logger logger = LoggerFactory.getLogger(RabbitMQItemReader.class);

  @Autowired
  private final RabbitTemplate template;

  public RabbitMQItemReader(RabbitTemplate rabbitTemplate) throws IOException {
    super(rabbitTemplate);
    template = rabbitTemplate;
  }

  @Override
  public Message read() {
    return template.receive();
  }
}

My step:

private Step step() throws Exception {
    ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
    executor.setCorePoolSize(100);
    executor.setMaxPoolSize(100);
    executor.setThreadNamePrefix("SThread");
    executor.setWaitForTasksToCompleteOnShutdown(true);

    executor.initialize();
    return stepBuilderFactory.get("queueToCassandraStep")
        .<Message, Vendor>chunk(100)
        .reader(itemReader)
        .listener(new QueueReaderListener<>())
        .processor(asyncItemProcessor())
        .writer(asyncItemWriter())
        .taskExecutor(executor)
        .build();
  }

Rabbit Config:

import lombok.Setter;
import org.springframework.amqp.rabbit.connection.CachingConnectionFactory;
import org.springframework.amqp.rabbit.connection.ConnectionFactory;
import org.springframework.amqp.rabbit.core.RabbitTemplate;
import org.springframework.amqp.support.converter.Jackson2JsonMessageConverter;
import org.springframework.amqp.support.converter.MessageConverter;
import org.springframework.boot.context.properties.ConfigurationProperties;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
@ConfigurationProperties("art.com.service.product.config.rabbitmq")
public class RabbitConfig {

  @Setter
  private String host;
  @Setter
  private Integer port;
  @Setter
  private String username;
  @Setter
  private String password;
  @Setter
  private String exchangeName;
  @Setter
  private String queueName;

  @Bean
  ConnectionFactory rabbitConnectionFactory() {
    CachingConnectionFactory connectionFactory = new CachingConnectionFactory(host);
    connectionFactory.setPort(port);
    connectionFactory.setUsername(username);
    connectionFactory.setPassword(password);
    return connectionFactory;
  }

  @Bean
  public MessageConverter jsonMessageConverter() {
    return new Jackson2JsonMessageConverter();
  }

  @Bean
  RabbitTemplate rabbitTemplate(ConnectionFactory rabbitConnectionFactory,
      MessageConverter jsonMessageConverter) {

    RabbitTemplate rabbitTemplate = new RabbitTemplate(rabbitConnectionFactory);
    rabbitTemplate.setQueue(queueName);
    rabbitTemplate.setExchange(exchangeName);
    rabbitTemplate.setMessageConverter(jsonMessageConverter);

    return rabbitTemplate;
  }


}

Let me know if any other configuration/code would help, I'll be happy to share them as well.

Wizard
  • 1,154
  • 2
  • 14
  • 41
  • You've written up a nice complaint- however, in order for anyone to help you, it might be nice if you provided at least a few technical details. Such might include the machine configuration, Rabbit config, message size, and any other pertinent details. – theMayer Nov 30 '17 at 03:06
  • @theMayer I added the configs and a little bit of code if that would help. – Wizard Nov 30 '17 at 19:30

1 Answers1

0

Well, amqpTemplate.receive() is definitely very slow. It is based on the Channel.basicGet() which is not so performant as long-living BasicConsumer. I would suggest abandon that AmqpItemReader in favor of MessageListenerContainer from Spring AMQ with good prefetch.

Artem Bilan
  • 113,505
  • 11
  • 91
  • 118
  • I need to clarify something here - a Basic.Get basically sends the same data as a Basic.Deliver at the protocol level - so this must be an artifact of the client implementation if its slower. In practice, it is far wiser to *pull* messages than to have them pushed, unless you're just using RMQ as a fancy router. – theMayer Nov 30 '17 at 04:51
  • 1
    The `AmqpItemReader` is based on the `RabbitTemplate.receive()`. That one opens `Channel`, creates `DefaultConsumer`, performs `basicConsume()`, waits for the `Delivery` and close everything in back order. So, yes, technically we do the same, but in case of `ListenerContainer` we do that constantly. With the `RabbitTemplate.receive()` we open and close resource for each call. That's where a performance bottleneck. – Artem Bilan Nov 30 '17 at 16:28
  • OK, so this isn't `Basic.Get` at all, but it is a fine implementation if the connection stays open. A `channel` is just an integer added in to the protocol, so in this implementation there would be one extra packet sent over the wire vs. a `Basic.Get` with the trade-off being that polling the server is not needed. I don't see why this would be terribly expensive but I could see how throughput would potentially be limited. A consumer would probably be a better choice, but there are [very important caveats](https://stackoverflow.com/questions/45139668#45148326). – theMayer Nov 30 '17 at 20:58
  • @ArtemBilan Thanks for your response. Is there any example I can take a look at for heading this route? I still want to use my existing ItemProcessor and ItemWriters. All the examples I come across have their own tasklet to perform the work. Is there anyway I could use the chunk oriented processing power that spring batch provides along with the MessageListenerContainer? – Wizard Nov 30 '17 at 21:35
  • Right, that's an idea, but I don't know (yet) how to come to that solution. We may use something like `ItemStream` and perform `RabbitOperations.execute()` with the mentioned `Channel.basicGet()`. That way we will open a `Channel` in the `ItemStream.open()` and won't close until the `ItemStream.open()`. But need to figure out how the flow in the class should look... – Artem Bilan Nov 30 '17 at 21:53