How can i use the stride trick in Karatsuba multiplication of polynomials?

Question

Can someone explain what is the stride trick in general? How can I use it when implementing polynomial multiplication algorithms for which the polynomials are represented as coefficient arrays? How can stride trick make the implementation more efficient?

Is it something that is more suitable for AVX/AVX2 vector instructions? Can we use it in any kind of coding platform? Which platforms or situations are more suitable for using this trick?

Edit: In the "Stride of an array" wikipedia link it says:

Many languages (including C and C++) allow structures to be padded to better take advantage either of the word length and/or cache line size of the machine. For example:
struct A {
    int a;
    char b;
};

struct A myArray[100];
In the above code snippet, myArray might well turn out to have a stride of eight bytes, rather than five (4 bytes for the int plus one for the char), if the C code were compiled for a 32-bit architecture, and the compiler had optimized (as is usually the case) for minimum processing time rather than minimum memory usage.

Can someone explain how can this make the code run faster ?

Are you talking about the trick of (for example) adding four 16-bit integers at a time using 64-bit addition? — stark, Dec 07 '21 at 18:53
@stark Actually i am not sure what it is asked to me for? Some people suggest me to apply stride trick for karatsuba polynomial multiplication in my code. Known as stride karatsuba. But I am not sure what did he mean exactly. But same person advice me to apply avx2 vector instructions which includes parallel adding operations on arrays ( by the way I have no background in Avx2 as well). he suggested this trick for an efficient runtime. It is probably something close to what you are saying. Can you give a quick example may be or a bit more explanation? — esra, Dec 07 '21 at 19:02
Maybe have a look here https://codereview.stackexchange.com/q/250117 — stark, Dec 07 '21 at 20:02
Do you actually really need to increase the performance? Is this part the bottleneck of your performance? If not, don't waste your time on it. — 12431234123412341234123, Dec 07 '21 at 20:07
@Craig Estey In wiki: "Many languages allow structures to be padded to better take advantage either of the word length and/or cache line size of the machine. For example: struct A { int a; char b; }; struct A myArray[100]; In the above code snippet, myArray might well turn out to have a stride of eight bytes, rather than five (4 bytes for the int plus one for the char), if the C code were compiled for a 32-bit architecture, and the compiler had optimized (as is usually the case) for minimum processing time rather than minimum memory usage" How can this make code run faster? — esra, Dec 07 '21 at 21:11
@12431234123412341234123 Yes indeed my main goal is to run the code faster. I have edited the question adding a wiki explanation. But I could not see how this technique mentioned in wiki actually make the code run faster? What is the logic of this system that makes it faster? Can you please explain it ? — esra, Dec 07 '21 at 21:17

How can i use the stride trick in Karatsuba multiplication of polynomials?

0 Answers0