6

I'm building a log viewing utility which will handle real-time search in TBs of logs. I have decided to store logs in Solr and use it as search engine.I will use Django as framework in my project. In order to use Solr with Django I saw there is haystack. Now my architecture will be like this.

             Store  Index         Search             Show
Log Stream ----------------> Solr --------> Haystack ------> Django

My logs are ordinary linux server logs, like network, operational, error, etc. Syslog is sending logs. I will allow filtering based on all log line. I will allow sorting by columns, for ex: ip column, date column etc.

Example log:

Dec 11 13:24:03 2012 [firewall] R0 SRC=192.168.9.11 DST=192.168.11.29 LEN=83 TOS=0x00 PREC=0x00 TTL=64 ID=0 DF PROTO=UDP SPT=36904 DPT=161 LEN=63 

I want to ask is it better to use MongoDB as storage, filtering and search in logs or Solr will do it better. Elasticsearch is coming in my mind, too. What will be your choice in such case.

Thanks in advance.

denizeren
  • 934
  • 8
  • 20
  • why do you need MongoDB? Solr is storing the logs already for you. – D_K Dec 11 '12 at 09:55
  • For example here : http://highscalability.com/how-rackspace-now-uses-mapreduce-and-hadoop-query-terabytes-data They use hdfs for storing logs I replaced it with MongoDB. Solr is only indexing. – denizeren Dec 11 '12 at 10:04
  • It seems that I won't need MongoDB as shown here http://graylog2.org/about only search engine will be enough. – denizeren Dec 11 '12 at 10:23
  • 1
    basically SOLR can index AND store your data, depending on the configuration. Just make sure that those fields in SOLR schema for which you want the data to be retrievable (not only searchable) are stored. – D_K Dec 11 '12 at 10:37
  • 1
    one more thing: hdfs can be used for storage (replicated etc) as you said, and for running map-reduce jobs in order to discover something new in the data. Then the outcome can be stored in either MongoDB or SOLR -- it is your choice and will depend on the task(s) you are trying to solve. – D_K Dec 11 '12 at 10:41
  • I am trying to do filtering and search. Which fits better? – denizeren Dec 11 '12 at 10:58
  • can you a bit elaborate / update the question with what kind of filtering you are intending to do? – D_K Dec 11 '12 at 11:08
  • 2
    For integrating Django to Solr, you can try this: https://github.com/sophilabs/django-solr The advantage is that django-solr emulates the classic Django ORM interface in case you're used to working with it. – martincho Dec 11 '12 at 12:05

1 Answers1

3

Why reinventing the wheel? There's Logstash, with an amazing interface: Kibana. You can feed it using rsyslog. However, if you really want/need to reimplement a log server, Logstash uses ElasticSearch. I would go with it.

vad
  • 1,196
  • 9
  • 22