What is the best way to profile and optimize clutter-box2d application on an arm target? I have tried using valgrind to profile the code on x86 before porting, but it doesnt seem to help. Ported application still runs considerably slow on ARM target.
I wasn't able to get valgrind working properly on arm target to profile and identify bottlenecks.
Used a bit of Oprofile but it gives a system wide snapshot and doesnt do much good. Since it does not produce call-graphs.