0

I'm looking for a tool where I can list the servers to check, the location of the file and it would return a list of the most common errors across those servers (say like 2 or 3 servers for report brevity) and get a report something like this

Server.A     Server.B     Server.C
--------     --------     --------
42 error.X   39 error.X   61 error.X
21 error.Y   7  error.Y   5  error.A
17 error.B   6  error.A   4  error.Y
4  error.A   2  error.R   3  error.S
3  error.R   1  error.S   1  error.R

Of course, excluding timestamps and other error details and just grepping out the common sub-strings and listing them like so. I'd be able to look at the table and see that error.B is unique to Server.A and conclude that there is something up with Server.A. Does something like this already exist? Is this something I'll have to code myself?

I'm not necessarily looking for this specific report, just the functionality to find unique errors across a set of error logs.

1 Answers1

1

It sounds like you need event correlation. See for example the free Simple Event Correlator.

Splunk and logstash may also provide some of the log analysis and correlation you need. Splunk is free only for very limited use, while logstash is open source.

No matter what route you go, you should investigate centralized logging and collecting all your logs on a dedicated log server, to facilitate this sort of analysis without having to connect to to remote systems first.

Phil Hollenback
  • 14,947
  • 4
  • 35
  • 52
  • Ok, I do have log collecting done every hour. I guess I'll have to build this tool myself. The idea is that I can find a single server in a fleet that is misbehaving by the unique logged errors before it starts causing availability problems. – neuroelectronic Jan 10 '11 at 17:50
  • Well, still sounds like central logging is the first step in finding these problems. Then you can run queries with things like logstash to track down misbehaving systems. – Phil Hollenback Jan 10 '11 at 18:05