Can someone explain what is the stride trick in general? How can I use it when implementing polynomial multiplication algorithms for which the polynomials are represented as coefficient arrays? How can stride trick make the implementation more efficient?
Is it something that is more suitable for AVX/AVX2 vector instructions? Can we use it in any kind of coding platform? Which platforms or situations are more suitable for using this trick?
Edit: In the "Stride of an array" wikipedia link it says:
Many languages (including C and C++) allow structures to be padded to better take advantage either of the word length and/or cache line size of the machine. For example:
struct A { int a; char b; }; struct A myArray[100];
In the above code snippet, myArray might well turn out to have a stride of eight bytes, rather than five (4 bytes for the int plus one for the char), if the C code were compiled for a 32-bit architecture, and the compiler had optimized (as is usually the case) for minimum processing time rather than minimum memory usage.
Can someone explain how can this make the code run faster ?