0

Stackers

I have a lot of messages in a RabbitMQ queue (running on localhost in my dev environment). The payload of the messages is a JSON string that I want to load directly into Elastic Search (also running on localhost for now). I wrote a quick ruby script to pull the messages from the queue and load them into ES, which is as follows :

#! /usr/bin/ruby
require 'bunny'
require 'json'
require 'elasticsearch'

# Connect to RabbitMQ to collect data
mq_conn = Bunny.new
mq_conn.start
mq_ch = mq_conn.create_channel
mq_q  = mq_ch.queue("test.data")

# Connect to ElasticSearch to post the data
es = Elasticsearch::Client.new log: true

# Main loop - collect the message and stuff it into the db.
mq_q.subscribe do |delivery_info, metadata, payload|
    begin
            es.index index: "indexname",
                     type:  "relationship",
                     body:  payload

    rescue
            puts "Received #{payload} - #{delivery_info} - #{metadata}"
            puts "Exception raised"
            exit
    end
end
mq_conn.close

There are around 4,000,000 messages in the queue.

When I run the script, I see a bunch of messages, say 30, being loaded into Elastic Search just fine. However, I see around 500 messages leaving the queue.

root@beep:~# rabbitmqctl list_queues
Listing queues ...
test.data    4333080
...done.
root@beep:~# rabbitmqctl list_queues
Listing queues ...
test.data    4332580
...done.

The script then silently exits without telling me an exception. The begin/rescue block never triggers an exception so I don't know why the script is finishing early or losing so many messages. Any clues how I should debug this next.

A

andyd
  • 21
  • 1

2 Answers2

1

I've added a simple, working example here:

https://github.com/elasticsearch/elasticsearch-ruby/blob/master/examples/rabbitmq/consumer-publisher.rb

It's hard to debug your example when you don't provide examples of the test data.

The Elasticsearch "river" feature is deprecated, and will be removed, eventually. You should definitely invest time into writing your own custom feeder, if RabbitMQ and Elasticsearch are a central part of your infrastructure.

karmi
  • 14,059
  • 3
  • 33
  • 41
0

Answering my own question, I then learned that this is a crazy and stupid way to load a message queue of index instructions into Elastic. I created a river and can drain instructions much faster than I could with a ropey script. ;-)

andyd
  • 21
  • 1