When I transpose an matrix of 64x64, I use a tile size of 256/64=4 and with a 32x32 I use an tile size of 256/32=8. How do I calculate the tile size of an asymmetric matrix? A tile size of 16 gives me the lowest misses but i can't explain it. Can someone help me with an explanation why 16 is the best tile size for a asymmetric matrix?
1 Answers
There's not really a "best" tile size for an asymmetric matrix, as the tile size will depend on the specific structure of the matrix. In general, you want to choose a tile size that will minimize the number of cache misses, which will depend on the stride of the matrix (i.e. the distance between consecutive elements in each row or column).
For example, if the matrix is stored in row-major order and has a stride of 1 (i.e. each element is stored immediately next to the previous element in the row), then a tile size of 16 would give you a stride of 16, which would be very efficient for cache accesses.
On the other hand, if the matrix is stored in row-major order but has a stride of 2 (i.e. each element is stored two elements away from the previous element in the row), then a tile size of 16 would give you a stride of 32, which would be less efficient for cache accesses. In this case, you might want to use a smaller tile size, such as 8 or 4.
Similarly, if the matrix is stored in column-major order and has a stride of 1, then a tile size of 16 would give you

- 111
- 2