I have a program which analyzes 150,000 files. Valgrind reports no memory leak but the program slows over time.
Some of the problems were related to using std::string too often and mktime taking too much time. (see C++ slows over time reading 70,000 files)
But it still slows down over time. Lotharyx suggested that container usage is causing heap fragmentation.
I read the various flow charts on the pros and cons of the different STL containers but I didn't quite get it.
In the pseudo code below, I'm not sure I've made the right choices to avoid heap fragmentation.
fileList.clear()
scan all disks and build "fileList", a std::set of file paths matching a pattern.
// filepaths are named by date, eg 20160530.051000, so are intrinsically ordered
foreach(filePath in fileList)
{
if (alreadyHaveFileDetails(filePath))
continue;
// otherwise
collect file details into a fileInfoStruct; // like size, contents, mod
fileInfoMap[time_t date] = fileInfoStruct;
}
// fileInfoMap is ordered collection of information structs of about 100,000 files
// traverse the list in order
foreach (fileInfo in fileInfoMap)
{
if (meetsCondition(fileInfo))
{
TEventInfo event = makeEventInfo()
eventList.push_back(event);
}
}
And the above sequence repeats forever.
So for choice of containers, I've used (or need):
fileList
-- list of unique strings containing 150,000 pathnames.
I chose std::set because it it automatically handles duplicates and maintains sort order automatically.
No random access, only add the entries, sort them (manually or automatically), and iterate over them.
fileInfoMap
-- an array of structures keyed by a time_t timestamp corresponding to the date of the file.
I chose std::map. It too would have 150,000 entries so occupies a lot of memory.
No random access, only add the entries to one end. Must iterate over them and, if necessary, delete entries from the middle.
eventList
-- a small list of "event" structures, say 50 items.
I chose std::vector. Not sure why really.
No random access, only add entries to one end and later iterate over the collection.
I'm fairly new to C++. Thanks for your consideration.