0

We have NFS used over firewall which is limiting the performance.

And we have this occasional scenarios of load increasing on the client host, whenever some huge IO operation is being done like tar.

My understanding is that tar can cause congestion and thus affecting other NFS operations.

And as the home directory of users is also on NFS and with the new poor performance of NFS (caused due to congestion with tar command) causes normal operations like ssh, su, ls etc. also to be slow and as in prouduction environment, these operations could be high and subsequently more operations are waiting to be done at the same time, increase load average. This increase in load average is found in sar reports.

But what I am not clear is where is the congestion actually created by tar? Is it inside the NFS storage (netapp in our case) or in the network?

My above hypothesis is correct only if the congestion happens on network, as we don't see any performance effect on other NFS clients at the time (as if the congestion is present in the storage server, all clients should be effected).

Also, I am not sure how to check if there is network congestion between the client and server if my hypothesis is right.

poige
  • 9,448
  • 2
  • 25
  • 52
GP92
  • 681
  • 2
  • 9
  • 27

1 Answers1

1

Impossible to tell. You need to understand the system more. Start with these 10 commands: Linux Performance Analysis in 60,000 Milliseconds

For example, if the vmstat r column is much greater than the number of CPUs, you have processes waiting to run and are CPU saturated. In top look at the process state codes to distinguish waiting on I/O (D) from on CPU (R).

On Linux, consider using a tool that will frequently poll a large number of metrics, such as netdata.

Do not limit the investigation to the host only. Look at utilization and error metrics on all network paths from the host to the storage. Check the storage array for utilization and errors.

John Mahowald
  • 32,050
  • 2
  • 19
  • 34
  • Hi.. actually we can say the the slowness is found only when any huge operation like tar is running on the system and stops once it is completed. – GP92 May 04 '18 at 14:22
  • Thanks for sharing the linking..I came to know the importance of vmstat command,which I never used. – GP92 May 04 '18 at 14:25
  • You can and should check every resource: CPU, storage, network bandwidth. Currently you have a symptom, you have not found the root cause. – John Mahowald May 05 '18 at 13:24