Anyone know of a tool to detect and report on repeating patterns in a log file?

Question

I need to monitor some large noisy log files (500m/day) from a Java application (log4j). Right now I manually look at the files, grep for "ERROR" and so on. However it should be possible for a tool to spot repeating patterns in the file, count them and provide drill down for the details of individual entries. Anyone know of such a tool? A text or Web based UI would be nice.

Hmm its starting to look like I will have to write a bash script with lots of greps. I was hoping to have something figure out the patterns automatically. — David Tinker, Dec 22 '11 at 06:43
seriously, this is exactly what perl was created for. You can write a self-learning script for those patterns, although that's obviously out of scope here. — John Gardeniers, Dec 22 '11 at 20:43
http://stackoverflow.com/questions/2590251/is-there-a-log-file-analyzer-for-log4j-files has a solution called Chainsaw. — John K. N., Dec 14 '16 at 11:53
https://www.datadoghq.com/blog/log-patterns/ <-- highly recommend, but while not crazy expensive it's not super cheap either. — neoakris, Jul 28 '19 at 21:03

score 6 · Answer 1 · answered Dec 19 '11 at 07:56

6

Splunk works wonders for this sort of stuff. I use it internally to gather all the logs and do quick searches via its excellent browser-based interface.

answered Dec 19 '11 at 07:56

Burhan Khalid

415
5
10

Unfortunately we would likely need the non-free version and its a bit expensive – David Tinker Dec 22 '11 at 06:34

score 3 · Answer 2 · answered Dec 19 '11 at 07:08

3

syslog-ng has a patterndb named feature. You can make patterns and match log entries to them in real time then send those entries to separate logfiles.

answered Dec 19 '11 at 07:08

Stone

7,011
1
21
33

score 3 · Accepted Answer · answered Dec 19 '11 at 07:12

3

I've heard of people applying Bayesian filtering on log files to spot interesting stuff versus routine log entries. They used spam filters, where the routine uninteresting entries were considered "good" while the unusual ones were considered as "spam" and using that coloring they were able to shift through.

It sounds a lot like machine learning stuff to me, but then again I've not seen it in action, only heard of it over beers.

answered Dec 19 '11 at 07:12

adamo

6,925
3
30
58

This seems perfectly reasonable to me, and you could even have very strong prior assumptions (in the Bayesian sense) about certain words that _always_ show up in server logs. – DrewConway Dec 19 '11 at 19:21
Yep this would do the job. Anyone know an implementation that I could train? – David Tinker Dec 22 '11 at 06:44
One could start with [CRM114](http://en.wikipedia.org/wiki/CRM114_(program)) I guess. Or wait until Drew Conway publishes his [Machine Learning for Hackers](http://shop.oreilly.com/product/0636920018483.do). I am still working to find the original reference to what I proposed. – adamo Dec 22 '11 at 08:36
Yep! I read it back in 2005 in [this sage-members thread](http://www.sage.org/lists/sage-members-archive/2005/msg00933.html) . The author of the email mentions [spamprobe](http://spamprobe.sourceforge.net/). – adamo Dec 22 '11 at 08:45

score 2 · Answer 4 · answered Dec 21 '11 at 00:23

While looking into syslog-ng and patterndb (+1 to that answer, above), I encountered a web-based tool called ELSA: http://code.google.com/p/enterprise-log-search-and-archive/. It's F/OSS in perl, with a web interface, and supposed to be really fast.

I haven't tried it yet, but once I'm done filtering using patterndb, I'll be trying ELSA.

score 1 · Answer 5 · answered Jan 11 '12 at 22:20

Try out petit.
I'm not sure if it will work with log4j format, but you might be able to write a custom filter for that.
Petit has no web interface, it displays graphs in your shell (ASCII art ftw!).
It's very useful to quickly see repeating messages and figure out when they happened or started to happen more frequently.

score 0 · Answer 6 · answered Dec 19 '11 at 07:35

0

If you are using debian/squeeze on your server, have a look at log2mail: http://packages.debian.org/squeeze/log2mail

answered Dec 19 '11 at 07:35

ThorstenS

3,122
19
21

score 0 · Answer 7 · answered Dec 14 '16 at 10:10

0

Glogg is a very good log explorer as you have the possibility to create filter base on string and color line or retrieve all occurrence to a string.

http://glogg.bonnefon.org/

answered Dec 14 '16 at 10:10

Alexandre Roux

470
1
6
20

score 0 · Answer 8 · answered Dec 14 '16 at 10:39

0

Splunk is usually a good solution for this. But you mentioned that it is too expensive for you. So I recommend you to look at Logstash or GrayLog.

answered Dec 14 '16 at 10:39

Raffael Luthiger

2,001
2
17
26

score -1 · Answer 9 · answered Dec 14 '16 at 09:36

You can try SEQREL's LogXtender, that automatically detects patterns and aggregates similar logs. The way does it is by creating regular expressions on the fly and using the cached regex to match other logs. With additional taxonomy detection more granularity can be added. A free version can be downloaded under https://try.logxtender.net.

Anyone know of a tool to detect and report on repeating patterns in a log file?

9 Answers9