13

I need to monitor some large noisy log files (500m/day) from a Java application (log4j). Right now I manually look at the files, grep for "ERROR" and so on. However it should be possible for a tool to spot repeating patterns in the file, count them and provide drill down for the details of individual entries. Anyone know of such a tool? A text or Web based UI would be nice.

David Tinker
  • 589
  • 1
  • 8
  • 18
  • 2
    To me this question absolutely screams `perl`. – John Gardeniers Dec 19 '11 at 09:38
  • Hmm its starting to look like I will have to write a bash script with lots of greps. I was hoping to have something figure out the patterns automatically. – David Tinker Dec 22 '11 at 06:43
  • seriously, this is exactly what perl was created for. You can write a self-learning script for those patterns, although that's obviously out of scope here. – John Gardeniers Dec 22 '11 at 20:43
  • http://stackoverflow.com/questions/2590251/is-there-a-log-file-analyzer-for-log4j-files has a solution called Chainsaw. – John K. N. Dec 14 '16 at 11:53
  • https://www.datadoghq.com/blog/log-patterns/ <-- highly recommend, but while not crazy expensive it's not super cheap either. – neoakris Jul 28 '19 at 21:03

9 Answers9

6

Splunk works wonders for this sort of stuff. I use it internally to gather all the logs and do quick searches via its excellent browser-based interface.

Burhan Khalid
  • 415
  • 5
  • 10
3

syslog-ng has a patterndb named feature. You can make patterns and match log entries to them in real time then send those entries to separate logfiles.

Stone
  • 7,011
  • 1
  • 21
  • 33
3

I've heard of people applying Bayesian filtering on log files to spot interesting stuff versus routine log entries. They used spam filters, where the routine uninteresting entries were considered "good" while the unusual ones were considered as "spam" and using that coloring they were able to shift through.

It sounds a lot like machine learning stuff to me, but then again I've not seen it in action, only heard of it over beers.

adamo
  • 6,925
  • 3
  • 30
  • 58
  • This seems perfectly reasonable to me, and you could even have very strong prior assumptions (in the Bayesian sense) about certain words that _always_ show up in server logs. – DrewConway Dec 19 '11 at 19:21
  • Yep this would do the job. Anyone know an implementation that I could train? – David Tinker Dec 22 '11 at 06:44
  • One could start with [CRM114](http://en.wikipedia.org/wiki/CRM114_(program)) I guess. Or wait until Drew Conway publishes his [Machine Learning for Hackers](http://shop.oreilly.com/product/0636920018483.do). I am still working to find the original reference to what I proposed. – adamo Dec 22 '11 at 08:36
  • Yep! I read it back in 2005 in [this sage-members thread](http://www.sage.org/lists/sage-members-archive/2005/msg00933.html) . The author of the email mentions [spamprobe](http://spamprobe.sourceforge.net/). – adamo Dec 22 '11 at 08:45
2

While looking into syslog-ng and patterndb (+1 to that answer, above), I encountered a web-based tool called ELSA: http://code.google.com/p/enterprise-log-search-and-archive/. It's F/OSS in perl, with a web interface, and supposed to be really fast.

I haven't tried it yet, but once I'm done filtering using patterndb, I'll be trying ELSA.

EdwardTeach
  • 632
  • 9
  • 20
1

Try out petit.
I'm not sure if it will work with log4j format, but you might be able to write a custom filter for that.
Petit has no web interface, it displays graphs in your shell (ASCII art ftw!).
It's very useful to quickly see repeating messages and figure out when they happened or started to happen more frequently.

faker
  • 17,496
  • 2
  • 60
  • 70
0

If you are using debian/squeeze on your server, have a look at log2mail: http://packages.debian.org/squeeze/log2mail

ThorstenS
  • 3,122
  • 19
  • 21
0

Glogg is a very good log explorer as you have the possibility to create filter base on string and color line or retrieve all occurrence to a string.

http://glogg.bonnefon.org/

Alexandre Roux
  • 470
  • 1
  • 6
  • 20
0

Splunk is usually a good solution for this. But you mentioned that it is too expensive for you. So I recommend you to look at Logstash or GrayLog.

Raffael Luthiger
  • 2,001
  • 2
  • 17
  • 26
-1

You can try SEQREL's LogXtender, that automatically detects patterns and aggregates similar logs. The way does it is by creating regular expressions on the fly and using the cached regex to match other logs. With additional taxonomy detection more granularity can be added. A free version can be downloaded under https://try.logxtender.net.

Mihnea
  • 1
  • 1