Fluentd: using multiple sources vs. splitting in mongo

Question

We load logs from apache access log file with fluent in_tail plugin and load into mongodb with the out_mongo plugin. We have about 10 different kinds of log messages in the log file. Would ideally like to keep them in separate mongo collections so the TTL (or capped collection size) can be set separately for each one of them. Just want to know which of these two approaches is better:

Keep separate <source> mappings in the fluent config file, all of which tail the same log file, but use different format regexes. Then I can match each one of these to a different mongo collection. (I believe it is not possible to specify multiple format regexes for multiple tags within one <source> element?)
Store all the logs in a single "raw" mongo collection and then write my own code to extract the different types of logs for the different types. I believe this option is best for performance, but not sure if the first approach is really bad.

You could possibly do this in near real time with implementation of a [tail-able cursor] on your "raw" input. But performance is always going to be relative to the language implementation and how you actually code it. You can look at this with C or C++ or possibly Go for the coding side of performance depending on how abstract you want to be. But that still raises the question whether the possible performance gain is outweighed by the time of development, and where a tool (or tools) may already exist. So this is very subjective. — Neil Lunn, Apr 11 '14 at 02:34
This is a very specific qn about what works best with fluent. — arun, Apr 11 '14 at 03:02

Fluentd: using multiple sources vs. splitting in mongo

0 Answers0