1

sadc is running on the production server.

When a incident happens, I want the admin to give me all the data (going back in the past as far as possible) so that I can analyze the incident and also might have lead to it during the previous week.

Analyzing directly on the production server sounds to me like a bad idea because:

  • With every passing hour, past data gets lost due to file rotation.
  • As a consultant I don't have access to the server, so I would have to 1. Ask the admin "Please give me the output of this sar command" 2. Analyze 3. Ask "I see, now give me the output of that other sar command" etc. Right at a time when the admin is very busy.
  • Doing things on the production server always bears the risk of doing a mistake, so better do as much as possible outside of it.

So:

  • Can I ask the admin to just send me the whole data, so that I can analyze it on my system?
  • Is it as simple as sending me the whole /var/log/sa/ directory? Or do I need the admin to send me other things too?
  • To analyze the data, do I need the exact same OS (Red Hat Enterprise Linux Server 6.3)? Or can I do the same on my Debian? I can install CentOS if necessary. Do I need the exact same sysstat version, or should it work if both are recent (>9.0.4)?
Nicolas Raoul
  • 1,334
  • 7
  • 22
  • 43
  • I’ve always used `sar` on a similar OS if not the server itself but I don’t see any reason why analysis wouldn’t work on data copied from RHEL to Debian. But that is easily tested (and remedied by copying the data into a VM if it doesn’t) once you have the data.. But copying the collected stats from /var/log/sa is all you need (to run `sar`) – HBruijn Mar 23 '18 at 05:47
  • @HBruijn: Thanks! It is easily tested indeed, but since I am writing a full procedure *in case* an incident ever happens, I must be certain that it will work even in exceptional circumstances. Feel free to post your comment as an answer, if upvotes show it is common practice then I will be satisfied with it :-) – Nicolas Raoul Mar 23 '18 at 07:12
  • I rarely use sar data after an incident. I use it a couple of times per day to see how all the systems are doing, more as a way to prevent incidents. – Gerard H. Pille Mar 23 '18 at 08:27

1 Answers1

0

Just ask for:

  • The operating system name and version,
  • The whole content of the /var/log/sa/ directory.

It is all you need to start analyzing the sar data on your own hardware, without hitting production performance or getting hit by rolling logs.

In my experience having the exact same OS version is not a strict requirement, for instance I have successfully analyzed CentOS sar data on Ubuntu. So just try on your favorite system and only install a new OS if that did not work.

Nicolas Raoul
  • 1,334
  • 7
  • 22
  • 43