0

Disclaimer: I have absolutely no experience with assembly programming, hence this question.

I want to write a program to traverse a matrix two ways, row-wise and column-wise, to demonstrate cache stuff for a presentation etc. The problem is, if I write it in C, gcc will rewrite my code to be row-wise no matter what I do. What's the assembly to force it to run column-wise?

Elliot Gorokhovsky
  • 3,610
  • 2
  • 31
  • 56
  • Assembly is platform dependent, and it's a bad idea to use for this anyway. Instead, show your code. Also note that if gcc rewrites it, it must guarantee that it works the same. In which case, why do you care? Finally, row-wise is usually faster due to memory locality. – Jester Feb 18 '17 at 20:00
  • @Jester I know it's faster. I'm trying to do an experiment to show the difference in cache usage for my science fair project. – Elliot Gorokhovsky Feb 18 '17 at 20:01
  • 1
    Does gcc rorder loops even if you don't use any optimization? (no -O flag) If it does, that's surprising. Please post an example. Anyway, if the insided loop does anything significant like call a function in a different module, passing the array value, it's certain the loops won't be re-ordered, since there's no way for gcc to determine that whatever the function is doing doesn't depend on order of traversal. – Gene Feb 18 '17 at 20:03
  • @gene Wow, ya, disabling optimization should do the trick! Hahaha I will delete this – Elliot Gorokhovsky Feb 18 '17 at 20:06
  • 1
    You might be able to find the specific optimization option that controls this so you can leave others enabled. – Jester Feb 18 '17 at 20:08
  • @jester Ya, for this it doesn't matter but in general that would be cool – Elliot Gorokhovsky Feb 18 '17 at 20:08
  • @RenéG - On the other hand, showing that performance is bad for unoptimized code is like saying that Usain Bolt doesn't *walk* very fast. If the compiler can optimize the code anyway, perhaps you should try to find a better example... – Bo Persson Feb 18 '17 at 21:23
  • Basically what I'm trying to do is show that an algorithm I'm working on uses less cache memory while maintaining the same # of hits (it's an inline caching thing). So the idea is I would run the cache intensive (row) code concurrently with my algorithm and see if it's faster when I turn on the optimization. The column code is for control. – Elliot Gorokhovsky Feb 18 '17 at 21:26
  • @RenéG it's probably possible to create inner/outer loop array modification in complex-enough way (or dependent on older values), that the compiler will be unable to reorder it. (but still reasonably similar to original calculation) Try to add [MCVE] here. Or try really hard to keep -O3 and just find particular optimizer option to switch loop reordering off. But if this reordering happens with real-world algorithm, then the "wrong" version gets fixed by compiler even in real app? – Ped7g Feb 19 '17 at 10:20

0 Answers0