I'm trying to make a (small) improvement to the leon3 processor (instruction set is SPARC v8) for an academic exercise. Before I decide what to improve, I want to profile a couple of benchmark programs that I want to tailor the improvements to.
I don't have access to a SPARC v8 machine.
Currently, I'm using an evaluation version of 'tsim' (a leon3 simulator) which does profiling at the functional level. Which is not really all that useful.
I have tried weird stuff like compiling with loop unrolling enabled and then counting the interesting instructions in the assembly code, but gcc refuses to unroll the loops, probably because some of them go too deep (e.g. 4 nested 'for' loops).
Ideally, what I'm looking for is a SPARC v8 simulator that runs the benchmark and profiles it at the instruction level (stuff like: 'smul' was executed x times) so that I can decide where to start trying with the improvement. Of course if there are other ways I can do this if not a profiler, I won't mind.
Any ideas?