3

There is a fastcgi example of a binary health check on the HAProxy blog. How would I construct a similar check for MongoDB such that I am doing a more robust health check for MongoDB - one that verifies that the server is actually there and responding rather than just checking that a port is open?

It would be useful if the health check was generic enough to work with the various MongoDB sharding components (config server, mongos, mongod).

Adam C
  • 5,222
  • 2
  • 30
  • 52

2 Answers2

6

First off, it is worth noting that you will have to be running HAProxy 1.5 or later in order to use the tcp-check feature (as of writing this answer 1.5.3 is the current stable release). Unfortunately Ubuntu 14.04 (for example) ships with version 1.4 so you will need to install from another source. Personally I used the packages from here so that I could keep everything installed via apt.

The example listed on the blog is a good starting point. Using it as a template, all we need to do is to pick an appropriate command to run, then break down that command into hex and construct the appropriate check for MongoDB. The MongoDB wire protocol is documented and published, so in theory you could build it up based on the spec, but there are easier ways to deconstruct such a command. There are built in dissectors in Wireshark that allow you to inspect MongoDB traffic and it provides a handy view of the hex with highlighting to aid us in our efforts here.

The command we will use here is the ping command. As you might expect, it is intended to be lightweight and to return even from a server under heavy load which makes it well suited for a health check command. Any such command can be written using the same methodology if you wish to use something else, but always be wary of using a command that requires a lock of any sort, or could add load to your database.

To illustrate how to get from the command you run to the hex, here is a small shot of the command I have constructed highlighted in Wireshark, having been decoded:

ping command in wireshark

Based on that information, let's create our TCP health check. I will comment on the various pieces to explain where they come from, and each should be easy enough to find in the grab above:

option tcp-check
 # MongoDB Wire Protocol
 tcp-check send-binary 39000000 # Message Length (57)
 tcp-check send-binary EEEEEEEE # Request ID (random value)
 tcp-check send-binary 00000000 # Response To (nothing)
 tcp-check send-binary d4070000 # OpCode (Query)
 tcp-check send-binary 00000000 # Query Flags
 tcp-check send-binary 746573742e # fullCollectionName (test.$cmd)
 tcp-check send-binary 24636d6400 # continued
 tcp-check send-binary 00000000 # NumToSkip
 tcp-check send-binary FFFFFFFF # NumToReturn
 # Start of Document 
 tcp-check send-binary 13000000 # Document Length (19)
 tcp-check send-binary 01 # Type (Double)
 tcp-check send-binary 70696e6700 # Ping:
 tcp-check send-binary 000000000000f03f # Value : 1
 tcp-check send-binary 00 # Term

 tcp-check expect string ok

It would be nice to use a full binary match on the response too, but unfortunately there is no way to predict the request ID generated by the server for each response, hence such a full match will fail (there is no way to selectively ignore pieces of a binary match).

EDIT: Sep 8th 2014 Thanks to comments from this Q&A from Baptiste and Felix I went back to re-test the partial binary match which seemed to fail initially - looks like that was just a case of me transcribing the binary incorrectly for the response, so I have amended the answer to reflect that.

The "ok" string is just an OK check - any such response will mean that the server in question is still responding, but the limited check is somewhat unsatisfying. While a full response check is not possible, everything after the request ID is usable.

Hence, here is the working binary check for the usable part of the response broken down, again using Wireshark to tease out the pieces as above:

# Check for response (starting after request ID)
tcp-check expect binary EEEEEEEE # Response To (from the check above)
tcp-check expect binary 01000000 # OpCode (Reply)
tcp-check expect binary 00000000 # Reply Flags (none)
tcp-check expect binary 0000000000000000# Cursor ID (0)
tcp-check expect binary 00000000 # Starting From (0)
tcp-check expect binary 11000000 # Document Length (17)
tcp-check expect binary 01 # Type (Double) 
tcp-check expect binary 6f6b # ok
tcp-check expect binary 00000000000000f03f # value: 1
tcp-check expect binary 00 # term

All of the above was tested successfully with MongoDB 2.6.4 and HAProxy 1.5.3

Adam C
  • 5,222
  • 2
  • 30
  • 52
1

Adam's answer is the right one and procedure is the one I use as well. That said, not sure the response is right, since the server is supposed to answer binary strings as well. Adam, can you confirm it works with your example? Otherwise, matching 6f6b should do the trick too:

tcp-check expect binary 6f6b

Baptiste

Baptiste
  • 316
  • 1
  • 2
  • Confirmed that a string check of "ok" does work with my example (checked with mongod, mongos and made sure HAProxy marked them as UP), "ok" is found in the response as a string 6f6b is just the hex/ASCII representation of ok in any case, so it's the same match. I thought that "ok" made more sense since it is listed as a string match, not binary (and a binary match fails because it tries to match the whole buffer I believe) – Adam C Sep 02 '14 at 12:39
  • Retested with `tcp-check expect string 6f6b` and the health check is now failing, so "ok" looks like the correct way to go for the MongoDB case – Adam C Sep 02 '14 at 12:45
  • That makes sense. According to the [docs](http://cbonte.github.io/haproxy-dconv/configuration-1.5.html#tcp-check%20expect), you would need to use `tcp-check expect binary 6f6b`, which is neat if what you are matching is **not** text. – Felix Frank Sep 03 '14 at 08:43
  • My fault, I did a typo! As Felix stated, I meant `tcp-check expect binary 6f6b` – Baptiste Sep 03 '14 at 13:58
  • And I must have had a typo when I did my original partial binary check (which led me to believe it did not work). I have edited (and attributed) to reflect this, and added my successful binary check for good measure - thanks for the help. @Baptiste - the answer probably makes more sense as a comment, so I would suggest removing it - or you can edit to make it correct (s/string/binary/) to make it less confusing if someone does read through in the future. – Adam C Sep 08 '14 at 15:33