0

I recently set up a standalone swift cluster with one proxy node and three storage nodes. I put some 100gb of data on that. I have attached a volume of 1tb to all three servers and the storage is mounted on those volumes. Everyday at some point of time the storage nodes seems to be unresponsive and unreachable to the point where I have to restart the servers. After doing some monitoring it seemed that the system.cpu.load would start increasing at random times the reason for which being that many processes were waiting for IO and they were stuck in D states resulting in degradation of performance of the machines. On doing ps aux | awk '{if ($8 ~ "D") print $0}' I found these processes that were in D states. In syslog I see these kind of errors too.

I have no other processes running on the machines and cpu, memory everything else seems to be normal. The puzzling part is I have another instance of the cluster with no data and those servers have no issues. I cannot figure out what exactly is causing the processes to be in wait state and some help would greatly be appreciated.

ymo
  • 1

0 Answers0