I have a program with the general structure shown below. Basically, I have a vector of objects. Each object has member vectors, and one of those is a vector of structs that contain more vectors. By multithreading, the objects are operated on in parallel, doing computation that involves much accessing and modifying of member vector elements. One object is acessed by only one thread at a time, and is copied to that thread's stack for processing.
The problem is that the program fails to scale up to 16 cores. I suspect and am advised that the issue may be false sharing and/or cache invalidation. If this is true, it seems that the cause must be vectors allocating memory too close to each other, as it is my understanding that both problems are (in simple terms) caused by proximal memory addresses being accessed simultaneously by different processors. Does this reasoning make sense, is it likely that this could happen? If so, it seems that I can solve this problem by padding the member vectors using .reserve() to add extra capacity, leaving large spaces of empty memory between vector arrays. So, does all this make any sense? Am I totally out to lunch here?
struct str{
vector <float> a; vector <int> b; vector <bool> c; };
class objects{
vector <str> a; vector <int> b; vector <float> c;
//more vectors, etc ...
void DoWork(); //heavy use of vectors
};
main(){
vector <object> objs;
vector <object> p_objs = &objs;
//...make `thread_list` and `attr`
for(int q=0; q<NUM_THREADS; q++)
pthread_create(&thread_list[q], &attr, Consumer, p_objs );
//...
}
void* Consumer(void* argument){
vector <object>* p_objs = (vector <object>*) argument ;
while(1){
index = queued++; //imagine queued is thread-safe global
object obj = (*p_objs)[index]
obj.DoWork();
(*p_objs)[index] = obj;
}