4

I have a backup script that:

  1. compress some files
  2. generate md5
  3. copy the compressed file to another server.
  4. the other server finishes comparing MD5 (to find copy errors).

Here it's the core script:

nice -n 15 tar -czvf $BKP $PATH_BKP/*.* \
| xargs -I '{}' sh -c "test -f '{}' && md5sum '{}'" \
| tee $MD5
scp -l 80000 $BKP $SCP_BKP
scp $MD5 $SCP_BKP

This routine got CPU at 90% at gzip routine, slowing down the production server. I tried to add a nice -n 15 but server still hangs.

I've already read 1 but the conversation didn't help me.

What is the best approach to solve this issue ? I am open to new architectures/solutions :)

Josir
  • 143
  • 1
  • 5

3 Answers3

5

If you use nice, you change the priority, but this will have a noticeable impact only if the CPU is close to 100% usage.

The server becomes slow, in your case, not because of the CPU usage, but because of the I/O on the storage. Use ionice to change the I/O priority and keep the nice for CPU priority.

Mircea Vutcovici
  • 17,619
  • 4
  • 56
  • 83
  • Thanks for replying Mircea. But when I issue a top, gzip is at 95% CPU. Is this not a measure that the problem is on CPU ? How can I measure the IO ? – Josir Jun 29 '12 at 19:21
  • Which CPU do you use ? – Spacedust Jun 29 '12 at 20:35
  • It's kvm/qemu virtual machine. The host is an XEON 8 cores but the guest VM has just one core available. The machine is very fast. With 50 active users, the CPU never get 10%. The only exception is the damn gzip :( – Josir Jun 29 '12 at 21:01
  • For I/O usage you can use: vmstat, iostat, dstat. I prefer dstat: `dstat -tam --top-io 10` – Mircea Vutcovici Jun 30 '12 at 02:25
  • If you have kvm the information reported by the top and other similar utilities could be inaccurate especially if you are overcommiting the host CPU. I would also investigate the problem from the host server. – Mircea Vutcovici Jun 30 '12 at 02:27
  • It was an IO problem! I solved the problem using a faster disk/controller. – Josir Jul 24 '12 at 22:59
  • Thank you for the feed back. I am glad that you found the problem. – Mircea Vutcovici Jul 25 '12 at 01:33
1

You could try using chrt to change the scheduling policy of the tar program to SCHED_BATCH.

As per the man page sched_setscheduler(2)

SCHED_BATCH: Scheduling batch processes (Since Linux 2.6.16.) SCHED_BATCH can only be used at static priority 0. This policy is similar to SCHED_OTHER in that it schedules the process according to its dynamic priority (based on the nice value). The difference is that this policy will cause the scheduler to always assume that the process is CPU-intensive. Consequently, the scheduler will apply a small scheduling penalty with respect to wakeup behaviour, so that this process is mildly disfavored in scheduling decisions.

   This policy is useful for workloads that are noninteractive, but do not
   want to lower their nice value, and for workloads that want a determin‐
   istic scheduling policy without interactivity causing extra preemptions
   (between the workload's tasks).

If your still out of luck you could try SCHED_IDLE instead. This would make this program only wake up if there is nothing else to run.

This changes the tar line to this for batch:

nice -n 15 chrt -b tar -czvf $BKP $PATH_BKP/*.* \
Matthew Ife
  • 23,357
  • 3
  • 55
  • 72
  • Thanks Mlfe. It seems interesting. But the server is always running something. Is it possible that tar/gzip never wake up and routine keeps running "ad eternum" ? – Josir Jun 29 '12 at 19:41
  • If your CPU averages are < 100% at a given interval. The server is not always running something, so I wouldnt worry about that. – Matthew Ife Jun 29 '12 at 19:43
  • I tried chrt but it didn't change the slowness. But I learned another Linux architecture detail. Thanks anyway. – Josir Jul 24 '12 at 23:01
0

Have you tried using pigz instead of gzip?

sybreon
  • 7,405
  • 1
  • 21
  • 20