Monitoring Processes in Detail - not summarized

Question

I want to monitor the running processes of a linux machine over time.

My do-it-yourself solution would be:

Dump ps aux --forest every minute to a file.

Background: If I get a message "something was wrong, yesterday around 8 o'clock" I want to see what was going on.

There are plenty of tools which summarize the load and io values, but I need more details.

I can do makeshift plumbing like the above dump of ps, but I guess there are better solutions.

Which tool could help me?

closed

This question was closed, since I ask for a tool. I asked the same question here:

https://softwarerecs.stackexchange.com/questions/20459/monitoring-processes-in-detail-not-summarized

Of course `ps` is a limited set of information regarding the server. A better approach might be to identify the important services on the box and monitor specific data points about those services using s tool such as nagios to collect and possibly graph the data. — uSlackr, Jun 17 '15 at 13:51
I want to see which processes were running at a datetime in the past. I can't imagine a graph which displays the process name, the parameters and the process tree. The graphs I know are summarized. But maybe I am missing something. — guettli, Jun 17 '15 at 20:21
@peterh a script from crontab is "plumbing" for me. I want to avoid this. — guettli, Jun 18 '15 at 06:37
@guettli my point is that a list of processes gives you very little information about what was happening on the box at a moment in time. Why not monitor data points about the actual services running the server as well as general performance stats about the box. — uSlackr, Jun 18 '15 at 13:47
@uSlackr I don't get what you mean with "data points". Please explain this. — guettli, Jun 18 '15 at 15:04
Say you had a apache running on the server, you could monitor # connections, Page requests, etc. If Mysql, then mysql ram use, cache hits, queries/sec, etc — uSlackr, Jun 18 '15 at 16:11
@uSlackr yes, I could do this. But summarized values don't show me what's going on. There are endless tools which deal with total connection count, lock count, disk io, ... These tools only tell me: there was something wrong. But what is the root if it? Where does "something wrong" come from? — guettli, Jun 19 '15 at 06:58
the question is not put on hold because noone does know the answer, instead it is put on hold because "Requests for product, service, or learning material recommendations are off-topic" you search for a product which does what you want, there are plenty out there, for at least 20 years.. so no you do not need to build it yourself, just search with your favorite search engine. — Dennis Nolte, Jun 19 '15 at 07:58
@DennisNolte Please tell me the keywords which I should type into my favorite search engine. I tried that before asking here, but found only systems which summarize and create charts. — guettli, Jun 19 '15 at 17:14
@DennisNolte I agree with you. I ask for a tool, and this is off-topic here. Is there a way for me to close my own question? I found only a link to delete it. — guettli, Jun 19 '15 at 17:41

score 4 · Answer 1 · answered Jun 18 '15 at 08:52

You have to realise that monitoring in too much granularity will have a negative impact on your system performance. That's the reason you normally monitor the general health of a server and services and additionally focus on specific performance indicators that are relevant for your services.

Then you shouldn't have to deal with "something was wrong last night" anymore but you'll know exactly what is wrong almost as soon it happens".

But if you do want to want to monitor your systems sort-lived processes instead of the services, one of the more "proper" methods would be to use the audit daemon.

Something like:

auditctl -a exit,always -S execve

which will log any program that gets started (with the execve system call).

Monitoring Processes in Detail - not summarized

1 Answers1