I have written a program (using FFTW) to perform Fourier transforms of some data files written in OpenFOAM.
The program first finds the paths to each data file (501 files in my current example), then splits the paths between threads, such that thread0 gets paths 0->61, thread1 gets 62-> 123 or so, etc, and then runs the remaining files in serial at the end.
I have implemented timers throughout the code to try and see where it bottlenecks, since run in serial each file takes around 3.5s and for 8 files in parallel the time taken is around 21s (a reduction from the 28s for 8x3.5 (serial time), but not by so much)
The problematic section of my code is below
if (DIAG_timers) {readTimer = timerNow();}
for (yindex=0; yindex<ycells; yindex++)
{
for (xindex=0; xindex<xcells; xindex++)
{
getline(alphaFile, alphaStringValue);
convertToNumber(alphaStringValue, alphaValue[xindex][yindex]);
}
}
if (DIAG_timers) {endTimerP(readTimer, tid, "reading value and converting", false);}
Here, timerNow() returns the clock value, and endTimerP calculates the time that has passed in ms. (The remaining arguments relate to it running in a parallel thread, to avoid outputting 8 lines for each loop etc, and a description of what the timer measures).
convertToNumber takes the value on alphaStringValue, and converts it to a double, which is then stored in the alphaValue array.
alphaFile is a std::ifstream object, and alphaStringValue is a std::string which stores the text on each line.
The files to be read are approximately 40MB each (just a few lines more than 5120000, each containing only one value, between 0 and 1 (in most cases == (0||1) ), and I have 16GB of RAM, so copying all the files to memory would certainly be possible, since only 8 (1 per thread) should be open at once. I am unsure if mmap would do this better? Several threads on stackoverflow argue about the merits of mmap vs more straightforward read operations, in particular for sequential access, so I don't know if that would be beneficial.
I tried surrounding the code block with a mutex so that only one thread could run the block at once, in case reading multiple files was leading to slow IO via vaguely random access, but that just reduced the process to roughly serial-speed times.
Any suggestions allowing me to run this section more quickly, possibly via copying the file, or indeed anything else, would be appreciated.
Edit:
template<class T> inline void convertToNumber(std::string const& s, T &result)
{
std::istringstream i(s);
T x;
if (!(i >> x))
throw BadConversion("convertToNumber(\"" + s + "\")");
result = x;
}
turns out to have been the slow section. I assume this is due to the creation of 5 million stringstreams per file, followed by the testing of 5 million if conditions? Replacing it with TonyD's suggestion presumably removes the possibility of catching an error, but saves a vast number of (at least in this controlled case) unnecessary operations.