I had an experiment on both GTX760(Kepler) and GTX750Ti(Maxwell) using benchmarks(Parboil, Rodinia). Then I analyzed results using Nvidia visual profiler. In most of the applications, the number of global instructions are enormously increased up to 7-10 times on Maxwell architecture.
spec. for both graphic cards
GTX760 6.0Gbps 2048MB 256bit 192.2 GB/s
GTX750Ti 5.4Gbps 2048MB 128bit 86.4Gb/s
Ubuntu 14.04
CUDA driver 340.29
toolkit 6.5
I compiled the benchmark application(No modification) then I collected the results from NVVP(6.5). Analyze all > Kernel Memory > From L1/Shared Memory section, I collected global load transaction counts.
I attached screenshots of our simulation result of histo ran on kepler(link) and maxwell(link)
Anyone know why the number of global instruction counts are increased on Maxwell architecture?
Thank you.