1

I am working on providing analytics for our web property based on instrumentation data we collect via a simple image beacon. Our data pipeline starts with Flume, and I need the fastest possible way to parse query string parameters, form a simple text message and shove it into Flume.

For performance reasons, I am leaning towards nginx. Since serving static image from memory is already supported, my task is reduced to handling the querystring and forwarding a message to Flume. Hence, the question:

What is the simplest reliable way to integrate nginx with Flume? I am thinking about using syslog (Flume supports syslog listeners), but I struggle with how to configure nginx to forward custom log messages to a syslog (or just TCP) listener running on a remote server and on a custom port. Is it possible with existing 3rd party modules for nginx or would I have to write my own?

Separately, anything existing you can recommend for writing a fast $args parser would be much appreciated.

If you think I am on a completely wrong path and can recommend something better performance-wise, feel free to let me know.

Thanks in advance!

Dmitry Frenkel
  • 1,708
  • 11
  • 17

1 Answers1

3

You should parse nginx log file like tail -f do and then pass results to Flume. It will be the most simple and reliable way. The problem with syslog is that it blocks nginx and may completely stuck under high-load or if something goes wrong (this is why nginx doesn't support it).

VBart
  • 14,714
  • 4
  • 45
  • 49
  • OK, what about a TCP proxy? Flume already listens on a port, dumping log file into it using tail seems redundant and won't survive server restart (tailing a file will be restarted from the top) – Dmitry Frenkel Nov 18 '12 at 20:02
  • You can store a position of last read line and start to read from that line after restart. There is only 3rd-party module that allows logging over UDP and does not block nginx: http://www.grid.net.ru/nginx/udplog.en.html – VBart Nov 18 '12 at 22:13
  • I'll give UDP a try - given that I can define custom log format and that Flume supports syslogUdp source, this avenue looks promising. – Dmitry Frenkel Nov 19 '12 at 01:08
  • OK, http://www.grid.net.ru/nginx/udplog.en.html worked like a charm. Thank you! I will do more stress testing and if I discover any useful information, I will post it here. – Dmitry Frenkel Nov 19 '12 at 22:42
  • If you encounter any problems, feel free to contact with the module author. He is a well known and an active developer of nginx modules. – VBart Nov 19 '12 at 23:01
  • grid.net.ru is now down, but you can find the plugin on github: https://github.com/vkholodkov/nginx-udplog-module – Colin Curtin Feb 01 '16 at 17:27