I have a C++ program that records a lot of data to disk continuously for a long time. As such I have a thread that monitors the disk space available and once it hits a certain percentage does some stuff.
This is on a dual quad core x64 CentOS system and the recording is happening on directly connected SATA disks that are used solely for the recording with ext3 filesystem. I am monitoring the disk usage by issuing a "df" command using system()
and reading in the result.
Whilst running it last night I noticed in the log files that it took a full 39 minutes to run the command to find the disk usage.
The code that handles the time out is this:
int DiskSpaceMonitor::handle_timeout(const ACE_Time_Value& time_, const void* pFunc_)
{
LOG4CXX_TRACE(m_logger, "DiskSpaceMonitor timer fired");
ACE_UINT8 usagePercent = m_diskChecker.getDiskSpaceUsagePercentage(m_monitoredDisk);
m_fileRecorder->notifyDiskUsage(usagePercent);
return 0;
}
Which calls this function that does the "df":
ACE_UINT8 DiskSpaceChecker::getDiskSpaceUsagePercentage(std::string diskMountPoint)
{
std::stringstream usageCommand;
usageCommand << "df -PH " << diskMountPoint << " | grep -v \"^Filesystem\" | awk '{print $5}' | cut -d'%' -f1 > " << m_mountSpaceFile;
system(usageCommand.str().c_str());
std::ifstream inFile(m_mountSpaceFile.c_str(), std::ios::in);
if (!inFile)
{
return 0;
}
std::string usageStr;
inFile >> usageStr;
int usage = atoi(usageStr.c_str());
inFile.close();
std::stringstream rmCmd;
rmCmd << "rm " << m_mountSpaceFile;
system(rmCmd.str().c_str());
LOG4CXX_DEBUG(m_logger, "Disk usage for disk: " << diskMountPoint << " = " << usage << "%");
return usage;
}
So between the trace logging statement in handle_timeout()
and the debug trace statement in getDiskSpaceUsagePercentage()
it took 39 minutes. But the delay really came before the inFile >> usageStr;
(because I can see that the read percentage was higher than expected - it should have gone up 1% or less but it jumped more than 16%).
Why the hell should the processing to run the command and read it in take such a huge amount of time?
Now I admit that the disks do get a bit of a hammering whilst they are being written to, but there is only one program writing to them and it is only writing one data file and one index file. So I don't see how this should take so long.
As an alternative is there an easy way to call a system()
function and have it return after a timeout period if it is taking too long?