Why, when I have a lot of messages on a queue (1200), are my messages being requeued even though my code processes them successfully and "acks" them?
AND
How can I fix this?
..
I have an application that uses rails amqp gem to make use of RabbitMQ. We put messages on a queue with information about emails that need to be sent, and the subscriber takes them off and sends them.
Sometimes hundreds of messages will be placed on the queue in quick succession.
We use acknowledgements to be sure that messages are not lost.
It was working very well until recently when I found that there were 1200 messages on the queue and they were not being consumed.
So why was my consumer not consuming them?
Looking at the logs I found that yes, it had consumed them and the emails were sent. I restarted the consumer and it reconsumed them, meaning we sent multiples of the same email to users. Yikes! But what I noticed by watching the RabbitMQ UI was that when I restarted the consumer, it took all 1200 messages off the queue at once. Then after a few minutes, these messages were requeued, even though my consumer was still going through them and sending the emails. In our code, the consumer does ack the message after each email is sent (message processed).
So my best guess at what is happening is that when there are lots of messages on the queue, the consumer takes them all off, but does not ack each one separately and instead waits until all the messages have been processed before doing a mass ack. As this takes a long time, 10 minutes, something is happening on the RabbitMQ side which says, hey this is taking too long, lets requeue all those messages, even while my consumer is still processing them successfully.
I have looked around a lot and found something called a heartbeat, but I cannot find any clear explanation of what this is and how to use it, if I need to use it at all. But it sounds like it could be related to communication between the queue and the consumer and could be the key to not having all those messages requeued while they are being processed.
Another thing I tried was using prefetch: 1. Described here . Though it does not seem appropriate because I only have one consumer. But it sounded hopeful because it looked as though it might force one by one acknowledgement of messages.
Should I consider multiple consumers given that we could get hundreds of messages placed on the queue in quick succession?
Here is my rake task to subscribe to the queue
task :subscribe_basic => :environment do |task_name|
begin # make sure any exception is logged
log = Rails.logger
routing_key = "send_letter"
tcp_connection_settings =
{:host=>"localhost",
:port=>5672,
:vhost=>"dev_vhost",
:user=>"dev_user",
:pass=>"abc123",
:timeout=>0.3,
:ssl=>false,
:on_tcp_connection_loss=>
handle_conn_loss,
:logging=>true}
begin
::AMQP.start(tcp_connection_settings) do |connection|
channel = ::AMQP::Channel.new(connection, :prefetch => 1)
binding.pry
channel.auto_recovery = true
cons = SendLetterConsumer.new channel, log
queue = channel.queue(routing_key, exclusive: false, durable: true)
consumer1 = AMQP::Consumer.new(channel, queue, nil, exclusive = false, no_ack = false)
consumer1.consume.on_delivery(&cons.method(:handle_message))
log.info "subscribed to queue #{routing_key}, config_key #{config_key} (#{Process.pid})"
Signal.trap 'INT' do # kill -s INT <pid> , kill -2 <pid>, Ctrl+C
log.info "#{task_name} stopping(#{Process.pid})..."
channel.close { EventMachine.stop } # otherwise segfault
end
end
rescue StandardError => ex
# 2015-03-20 02:52:49 UTC MQ raised EventMachine::ConnectionError: unable to resolve server address
log.error "MQ raised #{ex.class.name}: #{ex.message} Backtrace: #{ex.backtrace}"
end
rescue Exception => ex
log.error "#{ex.class.name}: #{ex.message} -- #{ex.backtrace.inspect}"
raise ex
end
end
Here is the consumer code we use to handle the message (called in above code: consumer1.consume.on_delivery(&cons.method(:handle_message))
) :
def handle_message(metadata, payload)
logger.info "*** SendLetterConsumer#handle_message start #{Time.now}"
logger.info payload
begin
# {course_app: aCourseApplication, errors:[]}
# {course_app: aFaultyCourseApplication, errors: ['error1', 'error2']}
msg = JSON.parse(payload)
ca = CourseApplication.find(msg['course_application_id'])
am = AutomatedMessage.find(msg['automated_message_id'])
user_name = msg['user_name']
if am.present?
raise "Cannot send a letter for Automated message with id #{am.id} because it does not have an associated message template" if am.message_template.nil?
logger.info "attempt to send letter for Automated Message with id #{am.id}"
result = LetterSender::send_letter a_course_application: ca, a_message_template: am.message_template, user_name: user_name
elsif msg.message_template_id
mt = MessageTemplate.find(msg.message_template_id)
result = LetterSender::send_letter a_course_application: ca, a_message_template: mt, user_name: user_name
end
if result
metadata.ack #'ack'-ing will remove the message from the queue - do this even if we created a faultyCourseApp
else
logger.error "Could not ack for #{msg}"
end
rescue StandardError => e
logger.error "#{e.message} #{e.backtrace}"
# do not 'ack' - must be a programming mistake so leave message on queue - keep connection open to cont processing other messages
# fix bug and restart the rake task to redeliver the unacknowledged messages
end
logger.info "*** SendLetterConsumer#handle_message end #{Time.now}"
end