I am looking into building a syslog / logging infrastructure and am pondering about some architecture best practices. Essentially, I see that a syslog system needs to support two conflicting workloads:
- log collection. Potentially massive streams of data need to be written quickly to disks and indexed.
- log querying. logs will be queried by both fixed fields such as date and source as well as text search.
What is the best disk/system setup assuming I'd like to keep it to a single server for now? Should I use SSDs or ramdisk to off-load some processing? some disks in stripe and some in raid5?
I am particularly eyeing Graylog2 with ElasticSearch/MongoDB