Profiling my code, i see a lot of cache misses and would like to know whether there is a way to improve the situation. Optimization is not really needed, I'm more curious about whether there exist general approaches to this problem (this is a follow up question).
// class to compute stuff
class A {
double compute();
...
// depends on other objects
std::vector<A*> dependencies;
}
I have a container class that stores pointers to all created objects of class A
. I do not store copies as I want to have shared access. Before I was using shared_ptr
, but as single A
s are meaningless without the container, raw pointers are fine.
class Container {
...
void compute_all();
std::vector<A*> objects;
...
}
The vector objects
is insertion sorted in a way that the full evaluation can be done by simply iterating and calling A.compute()
, all dependencies in A are resolved.
With a_i
objects of class A
, the evaluation might look like this:
a_1 => a_2 => a_3 --> a_2 --> a_1 => a_4 => ....
where => denotes iteration in Container
and --> iteration over A::dependencies
Moreover, the Container class is created only once and compute_all() is called many times, so rearranging the whole structure after creation is an option and wouldn't harm efficiency much.
Now to the observations/questions:
Obviously, iterating over
Container::objects
is cache efficient, but accessing the pointees is definitely not.Moreover, as each object of type
A
has to iterate overA::dependencies
, which again can produces cache misses.
Would it help to create a separate vector<A*>
from all needed object in evaluation order such that dependencies in A
are inserted as copies?
Something like this:
a_1 => a_2 => a_3 => a_2_c => a_1_c => a_4 -> ....
where a_i_c are copies from a_i.
Thanks for your help and sorry if this question is confusing, but I find it rather difficult to extrapolate from simple examples to large applications.