I have to implement matrix-vector multiplication using sse/sse2. Vector and matrix are large. Matrix is double, vector is float.
The point is that all calculations I have to do on floats - when I get data from matrix I promote it to float, do the calculations and I get float vector (later after some additional calculations on floats I have to add some float values (float matrix) to double values (double matrix).
My question is how I can do it using SSE/SSE2 - the problem is with doubles - I have pointer to double* and I have to somehow convert 4 doubles into 4 floats to fit in __mm128... Are there any intructions to do that?